Reputation: 25223
I get a warning when I use :=
right after converting all data.frames to data.tables:
library(data.table) #Win R-3.5.1 x64 data.table_1.12.2
df1 <- data.frame(A=1, B=2)
df2 <- data.frame(D=3)
lapply(mget(ls()), function(x) {
if (is.data.frame(x)) {
setDT(x)
}
})
df1[, rn:=.I]
Warning message: In
[.data.table
(df1, ,:=
(rn, .I)) : Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
The below also generates the same warning:
df3 <- data.frame(E=3)
df4 <- data.frame(FF=4)
for (l in list(df3, df4)) setDT(l)
df3[, rn:=.I]
Typing one by one works but tedious
df5 <- data.frame(G=5)
setDT(df5)
df[, rn := .I] #no warning
What is the idiomatic way to convert all data.frames to data.tables?
Related:
Upvotes: 7
Views: 928
Reputation: 1873
Not my answer. remember seeing this somewhere but can't find original post so can't link to it
x <- Filter(\(i) is.data.frame(eval(as.name(i))), ls())
lapply(x, \(i) setDT(get(i)))
Upvotes: 1
Reputation: 5138
A little late, but this seems like a great—and rare—use eapply()
(along with list2env()
). Of course, this is another option, certainly not asserting it is the idiomatic way.
library(data.table)
df1 <- data.frame(A=1, B=2)
df2 <- data.frame(D=3)
list2env(eapply(.GlobalEnv, function(x) {if(is.data.frame(x)) {setDT(x)} else {x}}), .GlobalEnv)
df1[, rn:=.I]
df1
A B rn
1: 1 2 1
Some timings and memory usage:
set.seed(0L)
sz <- 1e7
df1 <- data.frame(A=rnorm(sz))
df2 <- data.frame(B=rnorm(sz))
df3 <- copy(df1)
df4 <- copy(df2)
microbenchmark::microbenchmark(unit="ms", times=1L,
assign_mtd = {
for (x in ls()) {
if (is.data.frame(get(x))) {
assign(x, as.data.table(get(x)))
}
}
},
eval_sub_mtd = {
for(x in ls()){
if (is.data.frame(get(x))) {
eval(substitute(setDT(x), list(x=as.name(x))))
}
}
},
eapply_mtd = {
list2env(eapply(.GlobalEnv, function(x) {
if (is.data.frame(x)) setDT(x) else x
}), .GlobalEnv)
}
)
timings:
Unit: milliseconds
expr min lq mean median uq max neval
assign_mtd 115.922802 115.922802 115.922802 115.922802 115.922802 115.922802 1
eval_sub_mtd 3.293358 3.293358 3.293358 3.293358 3.293358 3.293358 1
eapply_mtd 1.913802 1.913802 1.913802 1.913802 1.913802 1.913802 1
Upvotes: 3
Reputation: 66819
setDT
operates on the name/symbol, while get
returns the value of the object. You can construct the setDT expression and evaluate it:
library(data.table)
df1 <- data.frame(A=1, B=2)
df2 <- data.frame(D=3)
for(x in ls()){
if (is.data.frame(get(x))) {
eval(substitute(setDT(x), list(x=as.name(x))))
}
}
rm(x)
df1[, rn:=.I]
I would use a loop rather than lapply
to avoid complications (eg, with the evaluating environment).
Upvotes: 5
Reputation: 20409
This should do the trick:
library(data.table) #Win R-3.5.1 x64 data.table_1.12.2
df1 <- data.frame(A=1, B=2)
df2 <- data.frame(D=3)
for (x in ls()) {
if (is.data.frame(get(x))) {
assign(x, as.data.table(get(x)))
}
}
df1[, rn:=.I]
I guess (not sure though) that the for/lapply
loop uses sort of an own environment which messes up with the by ref semantics of data.table
.
Upvotes: 3