Reputation: 3115
I've got myself in a little jam, and there is probably a better way to describe what I want to do (will edit if needed).
What I have is a data frame representing some observations, x. I would like to create a different dataframe, y, where I have all distinct combinations of some variables from x and where one of columns is a list of lists composed of other variables from x.
I've simplified this into an example, here is x:
x <- data.frame( c(1,1,1,1,1,1,1,2,2,2), c(11:12,11:12,11:12,11:12,16,17), c(101:110))
names(x) <- c("a","b","c")
a b c
1 1 11 101
2 1 12 102
3 1 11 103
4 1 12 104
5 1 11 105
6 1 12 106
7 1 11 107
8 2 12 108
9 2 16 109
10 2 17 110
And here is y (distinct combos of a,b in x):
y <- unique(data.frame(x$a,x$b))
names(y) <- c("a","b")
row.names(y) <- NULL
a b
1 1 11
2 1 12
3 2 12
4 2 16
5 2 17
What I want to do is to transform y into this:
a b c
1 1 11 101, 103, 105, 107
2 1 12 102, 104, 106
3 2 12 108
4 2 16 109
5 2 17 110
Where "c" in each row contains values of c from x collected into a list.
I'd like to find a nice succinct and idiomatic way of doing this, but will settle for anything that does the job.
Upvotes: 0
Views: 93
Reputation: 193517
This is going to be pretty and cryptic looking:
aggregate(c ~ a + b, x, I)
# a b c
# 1 1 11 101, 103, 105, 107
# 2 1 12 102, 104, 106
# 3 2 12 108
# 4 2 16 109
# 5 2 17 110
The I
function (you can also use c
) would create a list
in your third column. You don't need to create a separate data.frame
for the unique combinations of "a" and "b". Just use them as the grouping variables in aggregate
.
Of course, there are many other ways to do this.
Here's data.table
:
library(data.table)
X <- as.data.table(x)
X[, list(c = list(I(c))), by = list(a, b)]
Upvotes: 3