stackoverflowuser2010
stackoverflowuser2010

Reputation: 40889

Create vector of data frame subsets based on group by of columns

Suppose I have this R data frame:

              ts year month day
1  1295234818000 2011     1  17
2  1295234834000 2011     1  17
3  1295248650000 2011     1  17
4  1295775095000 2011     1  23
5  1296014022000 2011     1  26
6  1296098704000 2011     1  27
7  1296528979000 2011     2   1
8  1296528987000 2011     2   1
9  1297037448000 2011     2   7
10 1297037463000 2011     2   7

dput(a)
structure(list(ts = c(1295234818000, 1295234834000, 1295248650000, 
1295775095000, 1296014022000, 1296098704000, 1296528979000, 1296528987000, 
1297037448000, 1297037463000), year = c(2011, 2011, 2011, 2011, 
2011, 2011, 2011, 2011, 2011, 2011), month = c(1, 1, 1, 1, 1, 
1, 2, 2, 2, 2), day = c(17, 17, 17, 23, 26, 27, 1, 1, 7, 7)), .Names = c("ts", 
"year", "month", "day"), row.names = c(NA, 10L), class = "data.frame")

Is there a way to create a vector of data frames, where each one is a subset of the original with unique group-by combinations of year, month, and day? Ideally, I would like to get back data frames DF1, DF2, DF3, DF4, DF5, and DF6, in that order, where:

DF1:

              ts year month day
1  1295234818000 2011     1  17
2  1295234834000 2011     1  17
3  1295248650000 2011     1  17

DF2:

4  1295775095000 2011     1  23

DF3:

5  1296014022000 2011     1  26

DF4:

6  1296098704000 2011     1  27

DF5:

7  1296528979000 2011     2   1
8  1296528987000 2011     2   1

DF6:

9  1297037448000 2011     2   7
10 1297037463000 2011     2   7

Any help would be appreciated.

Upvotes: 3

Views: 3239

Answers (1)

lukeA
lukeA

Reputation: 54237

df <- df[order(df$year, df$month, df$day), ]
df.list <- split(df, list(df$year, df$month, df$day), drop=TRUE) 
listnames <- setNames(paste0("DF", 1:length(df.list)), sort(names(df.list)))
names(df.list) <- listnames[names(df.list)]
list2env(df.list, envir=globalenv())

# > DF1
#             ts year month day
# 1 1.295235e+12 2011     1  17
# 2 1.295235e+12 2011     1  17
# 3 1.295249e+12 2011     1  17
# > DF6
#               ts year month day
# 9  1.297037e+12 2011     2   7
# 10 1.297037e+12 2011     2   7

Edit:

As @thelatemail suggests, the same can be archieved easier by sorting correctly in split:

df.list <- with(df, split(df, list(day,month,year), drop=TRUE)) 
df.list <- setNames(df.list, paste0("DF",seq_along(df.list)))
list2env(df.list, envir=globalenv())

Upvotes: 3

Related Questions