Reputation: 728
I am trying to use fread()
to read in a table of 2 columns (x, y) and ~3 00 million rows (62 GB) and plot the x and y in a scatter plot. I am using "fread" and it works fine if I only use a small portion of the data, like 30000 rows.
But if I run it on the whole data set, I got:
"Warning message:
In setattr(ans, "row.names", .set_row_names(nr)) :
NAs introduced by coercion to integer range
/var/spool/torque/mom_priv/jobs/11244921.cri16sc001.SC: line 14: 70765 Killed Rscript 10_plotZ0Z1.R"
What could be the reason?
Upvotes: 0
Views: 651
Reputation: 6446
You could sample your big file as already suggested in the comments. Unfortunately, it seems that fread
doesn't have yet such a feature implemented - see this opened issue (upvoting the feature could motivate the developers to work on it). But as mentioned here, if you are on Linux, then try the shuf -n
shell command:
library(data.table)
# Generate some random data
dt <- data.table(x = rnorm(10^6), y = rnorm(10^6))
# write to csv file
fwrite(dt, "test-dt.csv")
# Read a random sample of 10^5 rows
dt2 <- fread(cmd = "shuf -n 100000 test-dt.csv")
dt[, plot(x,y)]
Alternatively, you could read blocks of rows from your file with multiple calls to fread
as showed here.
Upvotes: 2