Bogaso
Bogaso

Reputation: 3308

data.table::fread is failing to read data using cmd argument

I found that fread is not able to read csv file from disk when I use cmd argument. For example, when I place below line of code in terminal I get the values -

# grep -w 800,[0-9][0-9][0-9][0-9] 'File.csv'
800,2020-05-20,4610.1
800,2020-05-19,4670 

But, fread is failing to read this data

library(data.table)
fread(cmd = "grep -w 800,2020,[0-9][0-9][0-9][0-9] 'File.csv'")
Error in fread(cmd = "grep -w 800,2020,[0-9][0-9][0-9][0-9] 'File.csv'") : 
  File '/tmp/RtmpweexR2/filee0e134867a6' does not exist or is non-readable. getwd()=='/root'
In addition: Warning messages:
1: In (if (.Platform$OS.type == "unix") system else shell)(paste0("(",  :
  system call failed: Cannot allocate memory
2: In (if (.Platform$OS.type == "unix") system else shell)(paste0("(",  :
  error in running command

I dont see any memory problem running other R codes

Part of sessionInfo() -

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

data.table version - data.table_1.12.8

Any pointer on why is it happening will be highly helpful

Upvotes: 2

Views: 2398

Answers (1)

r2evans
r2evans

Reputation: 160437

That error appears to be because the command did not produce output (not sure all of the mechanics, but I can mimic it with similar commands). R's system (and therefore also system2 and shell) are horrible for quoting and blank-space delimiting, so calls using them (including fread(cmd=..)) are prone to odd problems that work on the command line.

In this case, I suspect that returning nothing is the problem, and you have a small inconsistency between your data and your command. For instance, if I create a File.csv with your sample data and run the grep,

system("grep -w 800,[0-9][0-9][0-9][0-9] 'File.csv'")
# 800,2020-05-20,4610.1
# 800,2020-05-19,4670 
# [1] 0
fread(cmd="grep -w 800,[0-9][0-9][0-9][0-9] 'File.csv'")
#       V1         V2     V3
#    <int>     <IDat>  <num>
# 1:   800 2020-05-20 4610.1
# 2:   800 2020-05-19 4670.0

it works. But if I use your second command, it fails:

fread(cmd = "grep -w 800,2020,[0-9][0-9][0-9][0-9] 'File.csv'")
# Warning in (if (.Platform$OS.type == "unix") system else shell)(paste0("(",  :
#   '(grep -w 800,2020,[0-9][0-9][0-9][0-9] 'File.csv') > C:\Users\r2\AppData\Local\Temp\RtmpMVUOli\filed30647c664a' execution failed with error code 1
# Warning in fread(cmd = "grep -w 800,2020,[0-9][0-9][0-9][0-9] 'File.csv'") :
#   File 'C:\Users\r2\AppData\Local\Temp\RtmpMVUOli\filed30647c664a' has size 0. Returning a NULL data.table.
# Null data.table (0 rows and 0 cols)

The pattern adds 2020, which is not present in the file. If I add a row to the data that supports that pattern, though,

fread(cmd = "grep -w 800,2020,[0-9][0-9][0-9][0-9] 'File.csv'")
#       V1    V2    V3
#    <int> <int> <int>
# 1:   800  2020  1234

I think this might be a bug somewhere in the loop, not sure if it's system's handling of zero data, or fread's handling of no file or an empty file.

Upvotes: 2

Related Questions