Reputation: 438
I have a text file with columnar data which I want to read into an R data frame. Here is how the text file looks.
absd kjehfjef
kjldnflknkdsnfds lkdld
kjdkskjdjjsbdjsjs
jkdkjbfsjbsd
ajdkjsadasjd
jksabdkjabdw
STATE WEEK WEEK WEEK CUM CUM CUM CUM CUM
TOTAL DEV DEV TOTAL DEV DEV DEV DEV
FROM FROM FROM FROM FROM FROM
NORM L YR NORM L YR NORM L YR
PRCT PRCT
ALABAMA 11 11 11 2137 274 217 15 11
ARIZONA 0 0 0 2622 -341 45 -12 2
ARKANSAS 1 1 1 1899 99 275 6 17
CALIFORNIA 0 0 0 1195 345 -6 41 0
The data frame should look like this.
STATE WEEK WEEK WEEK CUM CUM CUM CUM CUM
TOTAL DEV DEV TOTAL DEV DEV DEV DEV
FROM FROM FROM FROM FROM FROM
NORM L YR NORM L YR NORM L YR
PRCT PRCT
ALABAMA 11 11 11 2137 274 217 15 11
ARIZONA 0 0 0 2622 -341 45 -12 2
ARKANSAS 1 1 1 1899 99 275 6 17
CALIFORNIA 0 0 0 1195 345 -6 41 0
I want to ignore the junk at the beginning, and just capture the table into a data frame. One problem is that the column headings stretch over multiple lines.
Can someone help me achieve this?
I tried a<-read.table(text = <file name>, skip = grep("-34 more junk is here", <file name>))
but I got the error "no lines available in input".
Upvotes: 0
Views: 425
Reputation: 652
You can easily do what you want by using data.table. The question, as the other answer points out, is that retrieving the column names is the hard part.
data.table::fread(
file = <file_name>,
skip = 1
)
#> V1 V2 V3 V4 V5 V6 V7 V8 V9
#> 1: ALABAMA 11 11 11 2137 274 217 15 11
#> 2: ARIZONA 0 0 0 2622 -341 45 -12 2
#> 3: ARKANSAS 1 1 1 1899 99 275 6 17
#> 4: CALIFORNIA 0 0 0 1195 345 -6 41 0
Created on 2020-02-25 by the reprex package (v0.3.0)
You can also try read.delim
, but this is ugly as a data frame object.
read.delim(
file = <file_name>,
skip = 6,
col.names = "v"
)
#> v
#> 1 STATE WEEK WEEK WEEK CUM CUM CUM CUM CUM
#> 2 TOTAL DEV DEV TOTAL DEV DEV DEV DEV
#> 3 FROM FROM FROM FROM FROM FROM
#> 4 NORM L YR NORM L YR NORM L YR
#> 5 PRCT PRCT
#> 6 ALABAMA 11 11 11 2137 274 217 15 11
#> 7 ARIZONA 0 0 0 2622 -341 45 -12 2
#> 8 ARKANSAS 1 1 1 1899 99 275 6 17
#> 9 CALIFORNIA 0 0 0 1195 345 -6 41 0
Upvotes: 0
Reputation: 18663
Try
library(data.table)
a <- fread(file="junk.txt", skip="ALABAMA")
Then either add your own column names manually or, if there are too many, attempt to read them in from the file somehow. Maybe using read.fwf()
or scan()
.
From the help page of fread:
skip
If 0 (default) start on the first line and from there finds the first row with a consistent number of columns. This automatically avoids irregular header information before the column names row. skip>0 means ignore the first skip rows manually. skip="string" searches for "string" in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata).
Upvotes: 1