R package: read in data via system.file() and read.table() from R data.table

Question

I'm creating an R package with several files in /data. The way one loads data in the R package is to use the system.file(),

system.file(..., package = "base", lib.loc = NULL, mustWork = FALSE)

The file in /data I would like to load into an R data.table has the extension *.txt.gz, my_file.txt.gz. How do I load this into a data.table via read.table() or fread()?

Within the R script, I tried :

#' @import data.table
#' @export
my_function = function(){

    my_table = read.table(system.file("data", "my_file.txt.gz", package = "FusionVizR"), header=TRUE)    

}

This leads to an error via devtools document():

Error in read.table(system.file("data", "my_file.txt.gz", package = "FusionVizR"), header = TRUE) (from script1.R#7) : 
  no lines available in input
In addition: Warning message:
In file(file, "rt") :
  file("") only supports open = "w+" and open = "w+b": using the former

I appear to get the same issue via fread()

#' @import data.table
#' @export
my_function() = function(){

    my_table = fread(system.file("data", "my_file.txt.gz", package = "FusionVizR"), header=TRUE)    

}

This outputs the error:

Input is either empty or fully whitespace after the skip or autostart. Run again with verbose=TRUE.

So, it appears that system.file() doesn't give an object to the file which I could load into an R data.table. How do I do this?

Dirk is no longer here · Accepted Answer

Do yourself a HUGE favour and study fread() closely: it is one of the very best features in data.table. I have examples (at work) of reading from a pipe of other commands, of reading compresse data and more.

Here is a simple mock example:

R> write.csv(iris, file="/tmp/demo.csv")
R> system("gzip /tmp/demo.csv")  # to be very plain
R> fread("zcat /tmp/demo.csv.gz")
      V1 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:   1          5.1         3.5          1.4         0.2    setosa
  2:   2          4.9         3.0          1.4         0.2    setosa
  3:   3          4.7         3.2          1.3         0.2    setosa
  4:   4          4.6         3.1          1.5         0.2    setosa
  5:   5          5.0         3.6          1.4         0.2    setosa
 ---                                                                
146: 146          6.7         3.0          5.2         2.3 virginica
147: 147          6.3         2.5          5.0         1.9 virginica
148: 148          6.5         3.0          5.2         2.0 virginica
149: 149          6.2         3.4          5.4         2.3 virginica
150: 150          5.9         3.0          5.1         1.8 virginica
R>

Seems in the hast I wrote one column too many (rownames) but you get the idea.

Now, you don't even need fread (but it still more powerful than the alternatives):

R> head(read.csv(file="/tmp/demo.csv.gz"))
  X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 1          5.1         3.5          1.4         0.2  setosa
2 2          4.9         3.0          1.4         0.2  setosa
3 3          4.7         3.2          1.3         0.2  setosa
4 4          4.6         3.1          1.5         0.2  setosa
5 5          5.0         3.6          1.4         0.2  setosa
6 6          5.4         3.9          1.7         0.4  setosa
R>

R figured out by itself it needed to compress the file.

Edit: I was editing this question earlier when it was deleted under me, which is about as de-motivating as it gets. In a nutshell:

system.file() works, e.g. file <- system.file("rawdata", "population.csv", package="gunsales") does contain the complete path as the file exists: "/usr/local/lib/R/site-library/gunsales/rawdata/population.csv". But this is easy to mess up. (Needless to say I do have the package and the file.)
look into the data/ directory and what Writing R Extensions says. It is a good mechanism.

R package: read in data via system.file() and read.table() from R data.table

Answers (1)

Related Questions