Gaurav Bansal
Gaurav Bansal

Reputation: 5660

Automating split up of data frame

I have the following data frame in R:

> head(df)
    date x y z n  t
1 2012-01-01 1 1 1 0 52
2 2012-01-01 1 1 2 0 52
3 2012-01-01 1 1 3 0 52
4 2012-01-01 1 1 4 0 52
5 2012-01-01 1 1 5 0 52
6 2012-01-01 1 1 6 0 52
> str(df)
'data.frame':   4617600 obs. of  6 variables:
 $ date: Date, format: "2012-01-01" "2012-01-01" "2012-01-01" "2012-01-01" ...
 $ x   : Factor w/ 45 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ y   : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ z   : Factor w/ 111 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ n   : int  0 0 0 0 0 0 0 0 29 0 ...
 $ t   : num  52 52 52 52 52 52 52 52 52 52 ...

What I want to do is split this large df into smaller data frames as follows: 1) I want to have 45 data frames for each factor value of 'x'. 2) I want to further split these 45 data frames for each factor value of 'z'. So I want a total of 45*111=4995 data frames.

I've seen plenty online about splitting data frames, which turns them into lists. However, I'm not seeing how to further split lists. Another concern I have is with computer memory. If I split the data frame into lists, will it not still take up as much computer memory? If I then want to run some prediction models on the split data, it seems impossible to do. Ideally I would split the data into many data frames, run prediction models on the first split data frame, get the results I need, and then delete it before moving on to the next one.

Upvotes: 0

Views: 87

Answers (1)

Frank
Frank

Reputation: 66819

Here's what I would do. Your data already fits in memory, so just leave it in one piece:

require(data.table)
setDT(df)

df[,{
  sum(t*n) # or whatever you're doing for "prediction models"
},by=list(x,z)]

Upvotes: 1

Related Questions