PavoDive
PavoDive

Reputation: 6496

data.table adding column to several data tables in loop

I'm having a very simple problem, but have not found a way out of it. I was given about 200 tables from different stores, and have to analyze the whole body of data as a unit. As the tables are fairly big, I'm working with data.table::fread. My plan is loading each table, rbinding them together and proceed with the analysis. In order to retain which store is which, I'd like to add a variable store to each table, with the name of the store(same as the table). For that I planned to run a for loop with the names of the tables and create the new variable.

# I'll create some dummy data tables here. In reality they're loaded with fread
library(data.table)
centro <- data.table(x=letters[1:5], y = 1:5)
sur <- data.table(x=letters[2:7], y = 3:8)
...
norte <- data.table(x=letters[2:10], y = 3:11)

I need each table to have a variable "store" whose value is "centro", "sur" or "norte", depending on the store it belongs to. So I tried:

st <- c("centro", "sur", "norte")
for (i in st){
   i[, store := i]
}

Which, of course, didn't work (it complains that "i is not a data.table"). I then tried to create a list and try it via lapply:

sts <- list(centro, sur, norte)
sts <- lapply(sts, function(z) z[, store := names(z)])

Which doesn't work because names(z) are "x", "y" and "store".

How can I do this with a loop (or a *pply function), instead of manually doing xyz[, store := "xyz"] ?

side notes

Upvotes: 0

Views: 666

Answers (3)

Adam Hoelscher
Adam Hoelscher

Reputation: 1892

Direct answer, very little change

Wrap get around i, and assign the value back to the name contained in i.

for (i in st){
  assign(i, get(i)[, store := i])
}

Better system, some manipulation

Store the tables in a list and use the method proposed @arvi1000. A couple very obvious advantages

  1. Since your individual stores are all items under a list, your workspace has about 200 fewer objects in it, which is nice when you ls().
  2. You can use lapply instead of a for loop, which usually means cleaner code
  3. You can use AllStore <- do.call(what = rbind, args = my_dts).

'More Better', back to the beginning

  • When you read in the data from different stores, set the value of the store variable then. Use lapply and an anonymous function to do this.
  • Don't bother saving the my_dts list, because it's just duplication of data that will be saved in all_stores anyway.

Coincidentally, I had a similar business problem last week, and this is the method I used, working with about 250 csvs. Here's how I think your code would look.

st <- c("centro", "sur", "norte")

all_stores <- do.call(
  what = rbind,
  args = lapply(
    X = st,
    FUN = function(storeNM){
      temp <- fread(input = paste0(storeNM, '.csv'))
      temp[, store := storeNM]
      return(temp)
    }
  )
)

Upvotes: 1

Gaurav Taneja
Gaurav Taneja

Reputation: 1094

Try this:

for (i in st)
{
eval(parse(text=i))[, ':='(store = i)]
}

On second thoughts , probably this is better. Assuming you have all your datasets read in using s<-lapply(filepathlist,fread) then use:

st <- c("centro", "sur", "norte") # need to change this to a list of datasets you read in    
st2<-lapply(st,function(i){eval(parse(text=i))[, ':='(store = i)]})
st3<-rbindlist(st2)

Output:

    > st3
        x  y  store
     1: a  1 centro
     2: b  2 centro
     3: c  3 centro
     4: d  4 centro
     5: e  5 centro
     6: b  3    sur
     7: c  4    sur
     8: d  5    sur
     9: e  6    sur
    10: f  7    sur
    11: g  8    sur
    12: b  3  norte
    13: c  4  norte
    14: d  5  norte
    15: e  6  norte
    16: f  7  norte
    17: g  8  norte
    18: h  9  norte
    19: i 10  norte
    20: j 11  norte

Upvotes: 0

arvi1000
arvi1000

Reputation: 9582

The best way is to store your objects in a list and then access them by indexing the list.

library(data.table)

my_dts <- list (
  centro = data.table(x=letters[1:5], y = 1:5),
  sur = data.table(x=letters[2:7], y = 3:8),
  norte = data.table(x=letters[2:10], y = 3:11)
)

st <- c("centro", "sur", "norte")
for (i in st) my_dts[[i]][, store := i]

Upvotes: 3

Related Questions