leoce
leoce

Reputation: 735

r read.table too many items

I have a file with the size of 53 Gb and here's its head:

1   10  2873
1   100 22246
1   1000    28474
1   10000   35663
1   10001   35755
1   10002   35944
1   10003   36387
1   10004   36453
1   10005   36758
1   10006   37240

I'm running R 3.3.2 on a CentOS7 64-bit server with RAM of 128 Gb. I've read 4098 similar files into R. However, I can't read the largest one into R.

df <- read.table(f, header=FALSE, col.names=c('a', 'b', 'dist'), sep='\t', quote='', comment.char='')
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : har='')
  too many items

It returns error saying "too many items". Then I followed this tip:

df5rows <- read.table(f, nrows=5, header=FALSE, col.names=c('a', 'b', 'dist'), sep='\t', quote='', comment.char='')
classes <- sapply(df5rows, class)
df <- read.table(f, nrows=3231959401, colClass=classes, header=FALSE, col.names=c('a', 'b', 'dist'), sep='\t', quote='', comment.char='')

It still says "too many items", and "NAs are introduced". I also tried without colClasses, same result:

df <- read.table(f, nrows=3231959401, header=FALSE, col.names=c('a', 'b', 'dist'), sep='\t', quote='', comment.char='')
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : har='')
  too many items
In addition: Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  NAs introduced by coercion to integer range

The memory used never went over 90 Gb (when without any nrows or colClasses, with those args it never went over 60 Gb). I don't understand why R can't read the file.

I've also checked that there's no line with 4 or more columns.

Upvotes: 3

Views: 681

Answers (1)

Damien Cormann
Damien Cormann

Reputation: 189

Did you try to cut the file using a light editor such as (sed or VI)? Then you just have to merge the two dataset. On a very similar machine with big file, I experienced the same problem. Its was a junk line, with regard of the size of the file those kind of errors occurs.

Upvotes: 1

Related Questions