Data analysis cannot take place until the data are in the proper format. However, the task of wrangling data into the proper format can be very challenging. There are many different ways to do it, including cleaning the data by hand, which is tedious but perhaps results in better quality, and using computer algorithms that can help speed the process up. This talk will introduce the concept of data warehousing and introduce a technique known as Extract-Transform-Load (ETL).
The talk will also cover some record linkage techniques in order to combine data from disparate data sets. The software package called Coupler that can be used to accomplish these tasks will also be discussed.