How to read this particular .dat file in R?

Question

I have successfully downloaded from the CPS supplement data from here:

https://www.census.gov/data/datasets/time-series/demo/cps/cps-supp_cps-repwgt/cps-voting.2018.html

and after unzipping, I now have the file: nov18pub.dat

I have tried many different read.table and read.csv forms on this data but I am unable to view this data. Can anyone help me with turning this data into a workable df?

Ben · Accepted Answer

This looks like a fixed-width data file.

If you read in the first 10 lines and look at the length:

library(readr)

con = file("nov18pub.dat", "r")
line = readLines(con, n = 10)
num_char <- nchar(line)
close(con)

num_char
[1] 1018 1018 1018 1018 1018 1018 1018 1018 1018 1018

it appears to be 1018 characters.

You can use read.fwf from readr package to read in the file. In this example, the first 5 columns are read in (and the rest ignored using NULL).

read.fwf("nov18pub.dat",
         widths = c(15,2,4,2,3,rep(NULL, num_char - 26)),
         header = FALSE,
         col.names = c("Household_ID", "Month", "Year", "Line_Number", "Final_Outcome"),
         colClasses = c("character", rep("numeric", 4))
)

       Household_ID Month Year Line_Number Final_Outcome
1   000110118096587    11 2018           2           201
2   000110118096587    11 2018           2           201
3   710004140617571    11 2018           1           201
4   761077501690006    11 2018           1           201
5   761077501690006    11 2018           1           201
6   067091706007561    11 2018           1           201
7   067091706007561    11 2018           1           201
8   067091706007561    11 2018           1           201
9   067091706007561    11 2018           1           201
10  691715007600067    11 2018           2           201
...

This document provides details on the column widths and codes for each variable:

https://www2.census.gov/programs-surveys/cps/techdocs/cpsnov18.pdf

In there, it mentions the format as well:

Structure: Rectangular.

File Size: 143,050 logical records; 968 character logical record length.

However, with the supplement data, the record lengths appears to go to 1018.

How to read this particular .dat file in R?

Answers (2)

Related Questions