Nitin Mohan
Nitin Mohan

Reputation: 23

R Programming: read.csv() skips lines unexpectedly

I am trying to read a CSV file in R (under linux) using read.csv(). After the function gets completed I find that the number of lines read in R is less than the number of lines in CSV file (obtained by wc -l). Also, every time I read that specific CSV file always the same lines are getting skipped. I checked the formatting errors in CSV file but everything looks good.

But if I extract the lines being skipped into another CSV file, then R is able to read very lines from that file.

I am not able to find anywhere what my problem could be. Any help greatly appreciated.

Upvotes: 2

Views: 1173

Answers (1)

IRTFM
IRTFM

Reputation: 263301

Here's an example of using count.fields to determine where to look and perhaps apply fixes. You have a modest number of lines that are 23 'fields' in width:

> table(count.fields("~/Downloads/bugs.csv", quote="", sep=","))
     2     23     30 
   502     10 136532 
> table(count.fields("~/Downloads/bugs.csv", sep=","))
# Just wanted to see if removing quote-recognition would help.... It didn't.
     2      4     10     12     20     22     23     25     28     30 
 11308     24     20     33    642    251     10      2    170 124584 
> which(count.fields("~/Downloads/bugs.csv", quote="", sep=",") == 23)
 [1] 104843 125158 127876 129734 130988 131456 132515 133048 136764
[10] 136765

I looked at the 23 with:

txt <-readLines("~/Downloads/bugs.csv")[
                 which(count.fields("~/Downloads/bugs.csv", quote="", sep=",") == 23)]

And they had octothorpes ("#", hash-signs) which are comment characters in R data parlance.

> table(count.fields("~/Downloads/bugs.csv", quote="", sep=",", comment.char=""))
    30 
137044 

So.... use those settings in read.table and you should be "good to go".

Upvotes: 11

Related Questions