awunderground
awunderground

Reputation: 134

How can I read a ".da" file directly into R?

I want to work with the Health and Retirement Study in R. Their website provides ".da" files and a SAS extract program. The SAS program reads the ".da" files like a fixed width file:

libname EXTRACT 'c:\hrs1994\sas\' ; 

DATA EXTRACT.W2H;
INFILE 'c:\hrs1994\data\W2H.DA'  LRECL=358; 

INPUT 
  HHID $ 1-6
  PN $ 7-9
  CSUBHH $ 10-10
  ETC ETC    
;

LABEL
  HHID ="HOUSEHOLD IDENTIFIER"
  PN ="PERSON NUMBER"
  CSUBHH ="1994 SUB-HOUSEHOLD IDENTIFIER"
  ASUBHH ="1992 SUB-HOUSEHOLD IDENTIFIER"
  ETC ETC
;

1) What type of file is this? I can't find anything about this file type.

2) Is there an easy way to read this into R without the intermediate step of exporting a .csv from SAS? Is there a way for read.fwf() to work without explicitly stating hundreds of variable names?

Thank you!

Upvotes: 2

Views: 2997

Answers (2)

Toni
Toni

Reputation: 21

Thank you for this post! I just used it to extract the HRS2020 data.

Update for HRS2020 - also need to remove row 1 after removing the last row

# Remove first row
df.dict <- df.dict[-1,]

Upvotes: 2

Matt Jewett
Matt Jewett

Reputation: 3369

After a little more research it appears that you can utilize the Stata dictionary files *.DCT to retrieve the formatting for the data files *.DA. For this to work you will need to download both the "Data files" .zip file, and the "Stata data descriptors" .zip file from the HRS website. Just remember when processing the files to use the correct dictionary file on each data file. IE, use the "W2FA.DCT" file to define "W2FA.DA".

library(readr)

# Set path to the data file "*.DA"
data.file <- "C:/h94da/W2FA.DA"

# Set path to the dictionary file "*.DCT"
dict.file <- "C:/h94sta/W2FA.DCT"

# Read the dictionary file
df.dict <- read.table(dict.file, skip = 1, fill = TRUE, stringsAsFactors = FALSE)

# Set column names for dictionary dataframe
colnames(df.dict) <- c("col.num","col.type","col.name","col.width","col.lbl")

# Remove last row which only contains a closing }
df.dict <- df.dict[-nrow(df.dict),]

# Extract numeric value from column width field
df.dict$col.width <- as.integer(sapply(df.dict$col.width, gsub, pattern = "[^0-9\\.]", replacement = ""))

# Convert column types to format to be used with read_fwf function
df.dict$col.type <- sapply(df.dict$col.type, function(x) ifelse(x %in% c("int","byte","long"), "i", ifelse(x == "float", "n", ifelse(x == "double", "d", "c"))))

# Read the data file into a dataframe
df <- read_fwf(file = data.file, fwf_widths(widths = df.dict$col.width, col_names = df.dict$col.name), col_types = paste(df.dict$col.type, collapse = ""))

# Add column labels to headers
attributes(df)$variable.labels <- df.dict$col.lbl

Upvotes: 3

Related Questions