Phoenix Mu
Phoenix Mu

Reputation: 728

Warnings of "NAs introduced by coercion" in fread function

I am trying to use fread() to read in a table of 2 columns (x, y) and ~3 00 million rows (62 GB) and plot the x and y in a scatter plot. I am using "fread" and it works fine if I only use a small portion of the data, like 30000 rows.

But if I run it on the whole data set, I got:
"Warning message: In setattr(ans, "row.names", .set_row_names(nr)) : NAs introduced by coercion to integer range /var/spool/torque/mom_priv/jobs/11244921.cri16sc001.SC: line 14: 70765 Killed Rscript 10_plotZ0Z1.R"

What could be the reason?

Upvotes: 0

Views: 651

Answers (1)

Valentin_Ștefan
Valentin_Ștefan

Reputation: 6446

You could sample your big file as already suggested in the comments. Unfortunately, it seems that fread doesn't have yet such a feature implemented - see this opened issue (upvoting the feature could motivate the developers to work on it). But as mentioned here, if you are on Linux, then try the shuf -n shell command:

library(data.table)

# Generate some random data
dt <- data.table(x = rnorm(10^6), y = rnorm(10^6))
# write to csv file
fwrite(dt, "test-dt.csv")

# Read a random sample of 10^5 rows
dt2 <- fread(cmd = "shuf -n 100000 test-dt.csv")
dt[, plot(x,y)]

Alternatively, you could read blocks of rows from your file with multiple calls to fread as showed here.

Upvotes: 2

Related Questions