chinsoon12
chinsoon12

Reputation: 25223

Converting all data.frames in environment to data.tables

I get a warning when I use := right after converting all data.frames to data.tables:

library(data.table) #Win R-3.5.1 x64 data.table_1.12.2
df1 <- data.frame(A=1, B=2)
df2 <- data.frame(D=3)
lapply(mget(ls()), function(x) {
    if (is.data.frame(x)) {
        setDT(x)
    }
})
df1[, rn:=.I]

Warning message: In [.data.table(df1, , :=(rn, .I)) : Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.

The below also generates the same warning:

df3 <- data.frame(E=3)
df4 <- data.frame(FF=4)
for (l in list(df3, df4)) setDT(l)
df3[, rn:=.I]

Typing one by one works but tedious

df5 <- data.frame(G=5)
setDT(df5)
df[, rn := .I]    #no warning

What is the idiomatic way to convert all data.frames to data.tables?

Related:

  1. Using setDT inside a function
  2. Invalid .internal.selfref in data.table

Upvotes: 7

Views: 928

Answers (4)

Sweepy Dodo
Sweepy Dodo

Reputation: 1873

Not my answer. remember seeing this somewhere but can't find original post so can't link to it

x <- Filter(\(i) is.data.frame(eval(as.name(i))), ls())
lapply(x, \(i) setDT(get(i)))

Upvotes: 1

Andrew
Andrew

Reputation: 5138

A little late, but this seems like a great—and rare—use eapply() (along with list2env()). Of course, this is another option, certainly not asserting it is the idiomatic way.

library(data.table)
df1 <- data.frame(A=1, B=2)
df2 <- data.frame(D=3)

list2env(eapply(.GlobalEnv, function(x) {if(is.data.frame(x)) {setDT(x)} else {x}}), .GlobalEnv)

df1[, rn:=.I]
df1
   A B rn
1: 1 2  1

Some timings and memory usage:

set.seed(0L)
sz <- 1e7
df1 <- data.frame(A=rnorm(sz))
df2 <- data.frame(B=rnorm(sz))
df3 <- copy(df1)
df4 <- copy(df2)

microbenchmark::microbenchmark(unit="ms", times=1L,
    assign_mtd = {
        for (x in ls()) {
            if (is.data.frame(get(x))) {
                assign(x, as.data.table(get(x)))
            }
        }
    },
    eval_sub_mtd = {
        for(x in ls()){
            if (is.data.frame(get(x))) {
                eval(substitute(setDT(x), list(x=as.name(x))))
            }
        }
    },
    eapply_mtd = {
        list2env(eapply(.GlobalEnv, function(x) {
                if (is.data.frame(x)) setDT(x) else x
            }), .GlobalEnv)
    }
)

timings:

Unit: milliseconds
         expr        min         lq       mean     median         uq        max neval
   assign_mtd 115.922802 115.922802 115.922802 115.922802 115.922802 115.922802     1
 eval_sub_mtd   3.293358   3.293358   3.293358   3.293358   3.293358   3.293358     1
   eapply_mtd   1.913802   1.913802   1.913802   1.913802   1.913802   1.913802     1

Upvotes: 3

Frank
Frank

Reputation: 66819

setDT operates on the name/symbol, while get returns the value of the object. You can construct the setDT expression and evaluate it:

library(data.table) 
df1 <- data.frame(A=1, B=2)
df2 <- data.frame(D=3)
for(x in ls()){
  if (is.data.frame(get(x))) {
    eval(substitute(setDT(x), list(x=as.name(x))))
  }
}
rm(x)
df1[, rn:=.I]

I would use a loop rather than lapply to avoid complications (eg, with the evaluating environment).

Upvotes: 5

thothal
thothal

Reputation: 20409

This should do the trick:

library(data.table) #Win R-3.5.1 x64 data.table_1.12.2
df1 <- data.frame(A=1, B=2)
df2 <- data.frame(D=3)
for (x in ls()) {
    if (is.data.frame(get(x))) {
        assign(x, as.data.table(get(x)))
    }
}
df1[, rn:=.I]

I guess (not sure though) that the for/lapply loop uses sort of an own environment which messes up with the by ref semantics of data.table.

Upvotes: 3

Related Questions