user17144
user17144

Reputation: 438

Reading a text file at a URL into an R data frame

I have a text file with columnar data which I want to read into an R data frame. Here is how the text file looks.

            absd  kjehfjef
     kjldnflknkdsnfds lkdld
              kjdkskjdjjsbdjsjs

  jkdkjbfsjbsd
          ajdkjsadasjd
           jksabdkjabdw



   STATE         WEEK  WEEK WEEK    CUM   CUM   CUM   CUM   CUM
                 TOTAL DEV  DEV     TOTAL DEV   DEV   DEV   DEV
                       FROM FROM          FROM  FROM  FROM  FROM
                       NORM L YR          NORM  L YR  NORM  L YR
                                                      PRCT  PRCT

 ALABAMA            11   11   11    2137   274   217    15    11
 ARIZONA             0    0    0    2622  -341    45   -12     2
 ARKANSAS            1    1    1    1899    99   275     6    17
 CALIFORNIA          0    0    0    1195   345    -6    41     0

The data frame should look like this.

   STATE         WEEK  WEEK WEEK    CUM   CUM   CUM   CUM   CUM
                 TOTAL DEV  DEV     TOTAL DEV   DEV   DEV   DEV
                       FROM FROM          FROM  FROM  FROM  FROM
                       NORM L YR          NORM  L YR  NORM  L YR
                                                      PRCT  PRCT

 ALABAMA            11   11   11    2137   274   217    15    11
 ARIZONA             0    0    0    2622  -341    45   -12     2
 ARKANSAS            1    1    1    1899    99   275     6    17
 CALIFORNIA          0    0    0    1195   345    -6    41     0

I want to ignore the junk at the beginning, and just capture the table into a data frame. One problem is that the column headings stretch over multiple lines.

Can someone help me achieve this?

I tried a<-read.table(text = <file name>, skip = grep("-34 more junk is here", <file name>)) but I got the error "no lines available in input".

Upvotes: 0

Views: 425

Answers (2)

rdornas
rdornas

Reputation: 652

You can easily do what you want by using . The question, as the other answer points out, is that retrieving the column names is the hard part.

data.table::fread(
  file = <file_name>,
  skip = 1
)

#>            V1 V2 V3 V4   V5   V6  V7  V8 V9
#> 1:    ALABAMA 11 11 11 2137  274 217  15 11
#> 2:    ARIZONA  0  0  0 2622 -341  45 -12  2
#> 3:   ARKANSAS  1  1  1 1899   99 275   6 17
#> 4: CALIFORNIA  0  0  0 1195  345  -6  41  0

Created on 2020-02-25 by the reprex package (v0.3.0)

You can also try read.delim, but this is ugly as a data frame object.

read.delim(
  file = <file_name>,
  skip = 6,
  col.names = "v"
)

#>                                                                  v
#> 1     STATE         WEEK  WEEK WEEK    CUM   CUM   CUM   CUM   CUM
#> 2                   TOTAL DEV  DEV     TOTAL DEV   DEV   DEV   DEV
#> 3                        FROM FROM          FROM  FROM  FROM  FROM
#> 4                        NORM L YR          NORM  L YR  NORM  L YR
#> 5                                                       PRCT  PRCT
#> 6  ALABAMA            11   11   11    2137   274   217    15    11
#> 7  ARIZONA             0    0    0    2622  -341    45   -12     2
#> 8  ARKANSAS            1    1    1    1899    99   275     6    17
#> 9  CALIFORNIA          0    0    0    1195   345    -6    41     0

Upvotes: 0

Edward
Edward

Reputation: 18663

Try

library(data.table)
a <- fread(file="junk.txt", skip="ALABAMA") 

Then either add your own column names manually or, if there are too many, attempt to read them in from the file somehow. Maybe using read.fwf() or scan().


From the help page of fread:

skip
If 0 (default) start on the first line and from there finds the first row with a consistent number of columns. This automatically avoids irregular header information before the column names row. skip>0 means ignore the first skip rows manually. skip="string" searches for "string" in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata).

Upvotes: 1

Related Questions