Reputation: 1499
I have successfully downloaded from the CPS supplement data from here:
https://www.census.gov/data/datasets/time-series/demo/cps/cps-supp_cps-repwgt/cps-voting.2018.html
and after unzipping, I now have the file: nov18pub.dat
I have tried many different read.table and read.csv forms on this data but I am unable to view this data. Can anyone help me with turning this data into a workable df?
Upvotes: 2
Views: 7242
Reputation: 30474
This looks like a fixed-width data file.
If you read in the first 10 lines and look at the length:
library(readr)
con = file("nov18pub.dat", "r")
line = readLines(con, n = 10)
num_char <- nchar(line)
close(con)
num_char
[1] 1018 1018 1018 1018 1018 1018 1018 1018 1018 1018
it appears to be 1018 characters.
You can use read.fwf
from readr
package to read in the file. In this example, the first 5 columns are read in (and the rest ignored using NULL
).
read.fwf("nov18pub.dat",
widths = c(15,2,4,2,3,rep(NULL, num_char - 26)),
header = FALSE,
col.names = c("Household_ID", "Month", "Year", "Line_Number", "Final_Outcome"),
colClasses = c("character", rep("numeric", 4))
)
Household_ID Month Year Line_Number Final_Outcome
1 000110118096587 11 2018 2 201
2 000110118096587 11 2018 2 201
3 710004140617571 11 2018 1 201
4 761077501690006 11 2018 1 201
5 761077501690006 11 2018 1 201
6 067091706007561 11 2018 1 201
7 067091706007561 11 2018 1 201
8 067091706007561 11 2018 1 201
9 067091706007561 11 2018 1 201
10 691715007600067 11 2018 2 201
...
This document provides details on the column widths and codes for each variable:
https://www2.census.gov/programs-surveys/cps/techdocs/cpsnov18.pdf
In there, it mentions the format as well:
Structure: Rectangular.
File Size: 143,050 logical records; 968 character logical record length.
However, with the supplement data, the record lengths appears to go to 1018.
Upvotes: 0
Reputation: 889
Please try the following:
Import Dataset > From Text (readr) > Browse > select file > Delmiter > select Whitespace > Import
Just need to try various delimiters. Whitespace worked for me.
Happy data'ing.
Upvotes: 1