Reputation: 1472
I have real trouble understanding transformations between lists and data frame. I will lay out simple example what I'm trying to achieve. I have list of data frames, which are snapshots of points in time.
my_list <- list(data.frame(a = 1:5, b = c("a", "b", "c", "d", "f"), c= c("2015-01-01", "2015-01-01", "2015-01-01", "2015-01-01", "2015-01-01")),
data.frame(a = 1:5, b = c("a", "b", "c", "d", "f"), c= c("2015-02-01", "2015-02-01", "2015-02-01", "2015-02-01", "2015-02-01")))
I would like to transform this to following data frame.
a b c d f
2015-01-01 1 2 3 4 5
2015-02-01 1 2 3 4 5
Would appreciate, if there is solution to reverse this as well, so go from data frame back to original list.
Upvotes: 1
Views: 240
Reputation: 42544
You can try dcast()
.
But first, the list of data frames needs to be combined into one using rbindlist()
from package data.table
.
library(data.table)
dt_long <- rbindlist(my_list)
dt_long
# a b c
# 1: 1 a 2015-01-01
# 2: 2 b 2015-01-01
# 3: 3 c 2015-01-01
# 4: 4 d 2015-01-01
# 5: 5 f 2015-01-01
# 6: 1 a 2015-02-01
# 7: 2 b 2015-02-01
# 8: 3 c 2015-02-01
# 9: 4 d 2015-02-01
#10: 5 f 2015-02-01
dt_wide <- dcast(dt_long, c ~ b, value.var = "a")
dt_wide
# c a b c d f
#1: 2015-01-01 1 2 3 4 5
#2: 2015-02-01 1 2 3 4 5
The reverse operation is melt()
:
melt(dt_wide, id.vars = "c", variable.name = "b", value.name = "a")
# c b a
# 1: 2015-01-01 a 1
# 2: 2015-02-01 a 1
# 3: 2015-01-01 b 2
# 4: 2015-02-01 b 2
# 5: 2015-01-01 c 3
# 6: 2015-02-01 c 3
# 7: 2015-01-01 d 4
# 8: 2015-02-01 d 4
# 9: 2015-01-01 f 5
#10: 2015-02-01 f 5
Although still combined in one data.table, the column and row ordered can be adjusted as follows:
setcolorder(dt_long2, letters[1:3])
dt_long2[order(c, a)]
# a b c
# 1: 1 a 2015-01-01
# 2: 2 b 2015-01-01
# 3: 3 c 2015-01-01
# 4: 4 d 2015-01-01
# 5: 5 f 2015-01-01
# 6: 1 a 2015-02-01
# 7: 2 b 2015-02-01
# 8: 3 c 2015-02-01
# 9: 4 d 2015-02-01
#10: 5 f 2015-02-01
To complete the reverse operation, the large data.table
can be split up in a list of smaller ones by:
lapply(unique(dt_long2$c), function(x) dt_long2[c == x])
#[[1]]
# a b c
#1: 1 a 2015-01-01
#2: 2 b 2015-01-01
#3: 3 c 2015-01-01
#4: 4 d 2015-01-01
#5: 5 f 2015-01-01
#
#[[2]]
# a b c
#1: 1 a 2015-02-01
#2: 2 b 2015-02-01
#3: 3 c 2015-02-01
#4: 4 d 2015-02-01
#5: 5 f 2015-02-01
Here, it is assumed that c
is the discriminating variable.
Upvotes: 2