John
John

Reputation: 43279

reading a text file with an irregular header (in R)

I am trying to read a flat file into R.

It is separated by ';' and has 12 leading lines of comments to describe the content. I want to read the file and exlude the comments.

The problem however is that the commented line 11 contains the data headers as follows:

# Fields: labno; name; dob; sex; location; date

Is there a way that I can extract the headers form the comments and apply them to the data. The way I thought of doing it was to read the first 11 lines only and store everything from labno as a vector. The I would read everything from line 13 and use the store vector as column names for the the date.

Is there a way to read the first 11 lines and remove everything before labno

Thanks.

Upvotes: 1

Views: 1531

Answers (1)

IRTFM
IRTFM

Reputation: 263421

Step1: (read only the eleventh row containing column names. )

hdrs <- read.table("somefile.txt", nrows=1, skip=10, comment.char="")

Step2: (read the rest of the file, allowing default automatic names)

dat <- read.table("somefile.txt", skip=12)

Step3: (remove extraneous characters before applying the ‘fields’ as column names)

names(dat)  <- scan(textConnection(sub("# Fields\\:", "", hdrs)), 
                      what="character", sep=";")

Later versions of R allow ‘scan’ to have a ‘text’ argument rather than requiring the awkward textConnection function.

Upvotes: 6

Related Questions