Reputation: 6496
I'm having a very simple problem, but have not found a way out of it. I was given about 200 tables from different stores, and have to analyze the whole body of data as a unit. As the tables are fairly big, I'm working with data.table::fread
. My plan is loading each table, rbind
ing them together and proceed with the analysis. In order to retain which store is which, I'd like to add a variable store to each table, with the name of the store(same as the table). For that I planned to run a for loop with the names of the tables and create the new variable.
# I'll create some dummy data tables here. In reality they're loaded with fread
library(data.table)
centro <- data.table(x=letters[1:5], y = 1:5)
sur <- data.table(x=letters[2:7], y = 3:8)
...
norte <- data.table(x=letters[2:10], y = 3:11)
I need each table to have a variable "store" whose value is "centro", "sur" or "norte", depending on the store it belongs to. So I tried:
st <- c("centro", "sur", "norte")
for (i in st){
i[, store := i]
}
Which, of course, didn't work (it complains that "i is not a data.table"). I then tried to create a list and try it via lapply
:
sts <- list(centro, sur, norte)
sts <- lapply(sts, function(z) z[, store := names(z)])
Which doesn't work because names(z)
are "x", "y" and "store".
How can I do this with a loop (or a *pply function), instead of manually doing xyz[, store := "xyz"]
?
side notes
Upvotes: 0
Views: 666
Reputation: 1892
Wrap get
around i
, and assign the value back to the name contained in i
.
for (i in st){
assign(i, get(i)[, store := i])
}
Store the tables in a list and use the method proposed @arvi1000. A couple very obvious advantages
ls()
.lapply
instead of a for loop, which usually means cleaner codeAllStore <- do.call(what = rbind, args = my_dts)
.store
variable then. Use lapply
and an anonymous function to do this.my_dts
list, because
it's just duplication of data that will be saved in all_stores
anyway.Coincidentally, I had a similar business problem last week, and this is the method I used, working with about 250 csvs. Here's how I think your code would look.
st <- c("centro", "sur", "norte")
all_stores <- do.call(
what = rbind,
args = lapply(
X = st,
FUN = function(storeNM){
temp <- fread(input = paste0(storeNM, '.csv'))
temp[, store := storeNM]
return(temp)
}
)
)
Upvotes: 1
Reputation: 1094
Try this:
for (i in st)
{
eval(parse(text=i))[, ':='(store = i)]
}
On second thoughts , probably this is better. Assuming you have all your datasets read in using s<-lapply(filepathlist,fread)
then use:
st <- c("centro", "sur", "norte") # need to change this to a list of datasets you read in
st2<-lapply(st,function(i){eval(parse(text=i))[, ':='(store = i)]})
st3<-rbindlist(st2)
Output:
> st3
x y store
1: a 1 centro
2: b 2 centro
3: c 3 centro
4: d 4 centro
5: e 5 centro
6: b 3 sur
7: c 4 sur
8: d 5 sur
9: e 6 sur
10: f 7 sur
11: g 8 sur
12: b 3 norte
13: c 4 norte
14: d 5 norte
15: e 6 norte
16: f 7 norte
17: g 8 norte
18: h 9 norte
19: i 10 norte
20: j 11 norte
Upvotes: 0
Reputation: 9582
The best way is to store your objects in a list and then access them by indexing the list.
library(data.table)
my_dts <- list (
centro = data.table(x=letters[1:5], y = 1:5),
sur = data.table(x=letters[2:7], y = 3:8),
norte = data.table(x=letters[2:10], y = 3:11)
)
st <- c("centro", "sur", "norte")
for (i in st) my_dts[[i]][, store := i]
Upvotes: 3