Reputation: 23
I am trying to read a CSV file in R (under linux) using read.csv(). After the function gets completed I find that the number of lines read in R is less than the number of lines in CSV file (obtained by wc -l). Also, every time I read that specific CSV file always the same lines are getting skipped. I checked the formatting errors in CSV file but everything looks good.
But if I extract the lines being skipped into another CSV file, then R is able to read very lines from that file.
I am not able to find anywhere what my problem could be. Any help greatly appreciated.
Upvotes: 2
Views: 1173
Reputation: 263301
Here's an example of using count.fields
to determine where to look and perhaps apply fixes. You have a modest number of lines that are 23 'fields' in width:
> table(count.fields("~/Downloads/bugs.csv", quote="", sep=","))
2 23 30
502 10 136532
> table(count.fields("~/Downloads/bugs.csv", sep=","))
# Just wanted to see if removing quote-recognition would help.... It didn't.
2 4 10 12 20 22 23 25 28 30
11308 24 20 33 642 251 10 2 170 124584
> which(count.fields("~/Downloads/bugs.csv", quote="", sep=",") == 23)
[1] 104843 125158 127876 129734 130988 131456 132515 133048 136764
[10] 136765
I looked at the 23 with:
txt <-readLines("~/Downloads/bugs.csv")[
which(count.fields("~/Downloads/bugs.csv", quote="", sep=",") == 23)]
And they had octothorpes ("#", hash-signs) which are comment characters in R data parlance.
> table(count.fields("~/Downloads/bugs.csv", quote="", sep=",", comment.char=""))
30
137044
So.... use those settings in read.table
and you should be "good to go".
Upvotes: 11