Project Description
Summary:
View help for Summary
The Panel Study of Income Dynamics–Social, Health, and Economic Longitudinal File (PSID-SHELF) provides an easy-to-use and harmonized longitudinal file for the Panel Study of Income Dynamics (PSID), the longest-running nationally representative household panel survey in the world.
The first major benefit of PSID-SHELF is that it provides users with a longitudinal data file that features the complete sample of the PSID's multigenerational panel. The current version of PSID-SHELF includes 42 waves of survey data, ranging from 1968 to 2021. Every individual who has ever been observed in the PSID Main Study is included in PSID-SHELF. There are over 8,000 sample families, comprising more than 900,000 observations from roughly 53,000 sample members (and an additional 30,000 nonsample individuals who have ever lived in a PSID family unit).
The second major benefit of PSID-SHELF is that it features a novel set of harmonized measures on a wide range of substantive topics, including: (1) social characteristics (e.g., demographics, family type, education, race and ethnicity); (2) health characteristics (e.g., chronic conditions, COVID-19, dementia, disability); (3) economic characteristics (e.g., earnings, family income, occupations, wealth)—as well as a list of the PSID's essential administrative variables (e.g., survey identifiers, panel status, sample weights, household relationship records). Consequently, PSID-SHELF covers some of the most central variables in the PSID that have been collected for up to five decades.
PSID-SHELF can be used as a standalone data file, or it can easily be merged with other PSID data products to add additional public-use variables, by linking variables to a participant’s individual and family unit identifiers. The harmonized longitudinal file accentuates the PSID's strengths through its household panel structure that follows the same families over multiple decades and its multigenerational genealogical design that follows the descendants of PSID families that were originally sampled in 1968, with immigrant refresher samples in 1997–1999 and 2017–2019.
Although the PSID strives to ensure longitudinally consistent measurement, there are a number of variables that have changed across waves (e.g., because of new code frames, top-codes, question splitting, or other changes to the survey interview). But data harmonization, by necessity, involves analytic decisions that users may or may not agree with. These decisions are described at a high level in the PSID-SHELF User Guide and Codebook, but only a close review of the construction files that were used to generate PSID-SHELF can fully reveal each analytic decision. The Stata code underlying PSID-SHELF is publicly available not only to allow for such review but also to encourage users, as they become more comfortable with PSID, to use and alter the full code or selected code snippets for their own analytic purposes.
Despite multiple code reviews, it is possible that the files used to produce PSID-SHELF contain errors. As such, we encourage users to review the code carefully. If identified, please report any mistakes or errors to us (psidshelf.help@umich.edu). The authors wish to underscore that PSID-SHELF is currently being shared as a data product, in beta, and users are responsible for any errors arising from the provided code and files.
Current Version
2025-01 (data release number).
Permanent DOI
DOI:10.3886/E194322 (data).
DOI:10.7302/25205 (documentation).
Recommended Citations
Please cite PSID-SHELF in any product that makes use of the data or documentation. Anyone who uses PSID-SHELF should cite the data or the
PSID-SHELF User Guide and Codebook—and, as required by the PSID user agreement, the PSID Main Study.
PSID-SHELF data:
Pfeffer, Fabian T., Davis Daumler, and Esther Friedman. PSID-SHELF, 1968–2021: The PSID’s Social, Health, and Economic Longitudinal File (PSID-SHELF), Beta Release. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], <date last modified>. DOI:10.3886/E194322.
PSID-SHELF User Guide and Codebook:
Daumler, Davis, Esther Friedman, and Fabian T. Pfeffer. 2025. PSID-SHELF User Guide and Codebook, 1968–2021, Beta Release. PSID-SHELF Data Documentation 2025-01. Ann Arbor, MI: Survey Research Center, Institute for Social Research, University of Michigan. DOI:10.7302/25205.
The PSID Main Study data:
Panel Study of Income Dynamics, public-use dataset <or "restricted-use data," if appropriate>. Produced and distributed by the Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI: <year of data retrieval>.
Funding Sources:
View help for Funding Sources
United States Department of Health and Human Services. National Institutes of Health. National Institute on Aging (R01AG040213);
United States Department of Health and Human Services. National Institutes of Health. Eunice Kennedy Shriver National Institute of Child Health and Human Development (R01HD069609, R01AG040213);
National Science Foundation. Directorate for Social, Behavioral and Economic Sciences (SES1157698, SES1623684)
Scope of Project
Subject Terms:
View help for Subject Terms
social;
health;
economic;
psid;
panel study of income dynamics;
longitudinal;
data;
harmonized;
measures;
multigenerational;
sample;
demographics;
education;
family type;
geography;
race and ethnicity;
time use;
chronic conditions;
COVID-19;
dementia;
depression;
disability;
general wellbeing;
earnings;
employment;
expenditures;
family income;
occupations;
primary home;
wealth;
parent records;
child records;
marriage records
Methodology
Sampling:
View help for Sampling
The PSID follows a complex survey design that consists of five different subsamples.
In 1968, the original PSID sample consisted of 4,802 families, which were drawn from two different subsamples: (1) a nationally representative sample of 2,930 families designed by the Survey Research Center at the University of Michigan (i.e., the SRC sample); and (2) an oversample of 1,872 low-income families from the Survey of Economic Opportunity designed by the U.S. Census Bureau (i.e., the Census/SEO sample). The low-income oversample was included to facilitate the investigation of poverty-related issues. When combined, these SRC and Census/SEO samples constitute a national probability sample of families living in the United States, as of 1968.
Between 1990 and 1992, the PSID added 2,308 Latino families, which included individuals who were originally from Mexico, Puerto Rico, and Cuba. Although this sample represented three major groups of immigrants, it did not fully represent all of the immigrants to the United States from 1969 to 1990. Because of this crucial shortcoming, and a lack of sufficient funding, the Latino sample was dropped after 1995.
Between 1997 and 1999, the PSID added a nationally representative sample of 511 families that included individuals who (a) immigrated to the United States from 1969 to 1997, and (b) are not married to a person who was living in the United States in 1968; or, alternatively, (c) were children who were born after 1968 to parents who were not living in the United States in 1968.
Between 2017 and 2019, the PSID added a nationally representative of 481 families that included at least one member of the reference couple (i.e., either the reference person or spouse/partner) who (a) immigrated to the United States from 1998 to 2016, when the screening effort took place; or, alternatively, (b) was a child who was born after 1997 to parents who were not living in the United States in 1997. The purpose of the immigrant refresher samples was to ensure that the PSID remained representative of the current U.S. population, as of the addition of the most recent refresher sample.
Collection Mode(s):
View help for Collection Mode(s)
computer-assisted personal interview (CAPI);
computer-assisted telephone interview (CATI);
face-to-face interview;
telephone interview
Weights:
View help for Weights
The PSID uses sample weights to account for the differential probabilities of selection into the panel—due to the PSID's complex survey design, its multiple subsamples, and panel attrition. When sample weights are used, the PSID provides a nationally representative portrait of the noninstitutionalized population of the United States.
PSID-SHELF provides a primary set of sample weights that are available in every survey year: (1) family longitudinal weights and individual longitudinal weights. While the family longitudinal weights are calculated at the level of the family unit (and assigned to every current member of the family unit), the individual longitudinal weights are only assigned to sample persons (i.e., nonsample persons are not assigned an individual weight).
Between 1990 and 1995, PSID-SHELF provided an additional two sets of sample weights. (2) For analyses that combine members of the original samples with members of the Latino sample, users should select the sample weights that were designed for the joint estimation of the Latino and main samples. (3) For analyses that solely examine the members of the Latino sample (i.e., analyses that do not include any individual from the PSID's main samples), users should select the sample weights that were designed for the exclusive estimation of the Latino sample. Finally, for analyses that only include the PSID's original samples (i.e., SRC and Census/SEO), users should continue to select the default family and individual longitudinal weights.
In 1997, the PSID introduced a fourth set of sample weights: (4) family cross-sectional weights and individual cross-sectional weights. The family cross-sectional weights are available from 1997 to 2003 and, after a 12-year hiatus, from 2015 until present; and the individual cross-sectional weights are available from 1997 until present, without interruption. The benefits of using the PSID's cross-sectional weights are two-fold. First, the individual cross-sectional weights are assigned to both sample and nonsample persons (unlike the individual longitudinal weights, which are only assigned to sample persons). This means that the cross-sectional weights permit the user to take advantage of all available data for individual-level analyses. Second, the cross-sectional weights are calibrated to the country's current population characteristics, based on the Current Population Survey (CPS) or the American Community Survey (ACS). By accounting for year-to-year changes to population characteristics, the calibrating procedure also serves as an adjustment for the PSID's non-coverage of immigrant populations who entered the United States after 2017 (Chang et al. 2023). By contrast, the PSID's longitudinal weights are not calibrated at each wave against external, nationally representative population estimates.
The PSID's longitudinal weights should be used for any analysis that includes information from two or more waves of data. Additionally, a user who is interested in reporting a time series of repeated cross-sectional estimates should consider using the longitudinal weight in each wave, due to the consistency in the methodology used to derive the longitudinal weights across survey years. On the other hand, the PSID's cross-sectional weights are well suited for analyses that use information from only one wave, especially if a user would like to draw inferences from the full set of sample and nonsample persons within a particular year. Prior to 1997, users who wish to conduct cross-sectional analyses are advised to use the PSID's longitudinal weights, while recognizing that their analyses will be based solely on PSID sample persons.
Finally, PSID-SHELF provides two variables that account for the PSID's complex sampling design. The stratum and cluster variables are used for computing complex-sample-design-corrected standard errors and variance estimates via the Taylor Series Linearization or Repeated Replication methods. These variables may be used with a variety of software programs that incorporate the complex sample design into variance estimation, including Stata, SAS, Sudaan, SPSS and others.
Related Publications