CEPR uniform extracts of the SIPP

Basics

The CEPR extracts are Stata data files. The files are organized by panel and then by thematic sets. Each set is a data file containing observations in person-month or person-wave format, depending on whether the set is based primarily on core or topical module data. If the structure of the SIPP is unfamiliar to you, please first read about the survey.

Structure of the extracts

CEPR's data files are organized by panel and then by the "set" of thematic content. Set B contains basic demographic information like ethnicity and geography. Set E contains child care usage. Set B is generally drawn from information that is asked every month, so the observations in Set B are sorted by person-month. Set E is draw from topical modules on child care, asked only during a wave or two, so observations in Set E are sorted by person-wave. You can read more about the content of each set at ceprDATA.org.

Each person in CEPR's SIPP extracts is assigned a unique ID that remains the same throughout the entire panel. In a core-based set like Set B, observations are uniquely identified by ID, wave, and reference month -- the variables id wave srefmon. In a topical module-based set like Set E, you can uniquely identify observations by ID and wave numbers -- the variables id wave.

This means that within a given panel, you can match people across Sets. For example, using the unique identifiers id wave srefmon you can merge the marital status variables in Set C of the 2001 panel with the income variables in Set F in the 2001 panel, creating a new data set which you can use to investigate how income varies by marital status.

You could also extend the size of this data set by using other panels. Repeat the merge you did above with Sets C and Sets F with the 1996 and 1993 panels, and append each of these to the data you created above to yield a marital status-income microdata set for the decade of 1993-2003. Some specific examples of merging and appending data are provided here.

Categories: