IAMTubby
IAMTubby

Reputation: 1667

unable to get column names when using skip along with read.csv

I was using the skip option in read.csv to skip a few lines before reading into my data frame from a csv file. However, when I do a names(dataframe) upon doing this, I lose my column names and get some random strings as column names. Why does this happen?

> mydf = read.csv("mycsvfile.csv",skip=100)
> names(mydf)
[1] "X2297256" "X3"

Without the skip option, it works fine

> mydf = read.csv("mycsvfile.csv")
> names(mydf)
[1] "col1" "col2"      

Upvotes: 1

Views: 2103

Answers (2)

ClimateUnboxed
ClimateUnboxed

Reputation: 8077

It is not necessary to read in the headers separately. You can do this in one line by using negative indexing on the dataframe, where a negative index means "keep all lines except the negative index (range)".

So if you want to keep the headers and then skip the first N lines you just need to do this:

mydf<-read.csv("mycsvfile.csv",header=T)[-1:-N,]

Upvotes: 0

MrFlick
MrFlick

Reputation: 206197

If you skip lines in a file, you skip the complete line, so if your header is in the first line and you skip 100 lines, the header line will be skipped. If you want to skip part of the the file and still keep headers, you'll need to read them separately

headers <- names(read.csv("mycsvfile.csv",nrows=1))
mydf <- read.csv("mycsvfile.csv", header=F, col.names=headers, skip=100)

Upvotes: 8

Related Questions