Reputation: 159
I have a data frame with vectors in a format like the following
ID <- c("ID1", "ID1", "ID1", "ID2", "ID2", "ID3")
ModNum <- c(1, 2, 3, 1, 2, 0)
Amnt <- c(2.00, 3.00, 2.00, 5.00, 1.00, 5.00)
df <- data.frame(ID, ModNum, Amnt)
My desired output would be to create a new vector in the data frame "Mod" which would be something like
ID Mod
ID1 ((1,2.00), (2, 3.00), (3, 2.00))
ID2 ((1, 5.00), (2, 1.00))
ID3 ((0, 5.00))
Then I would delete the redundant IDs.
I have considered using tapply and looping over the IDs to append to a list, but I am a bit confused about how to go about this.
How to add variable key/value pair to list object?
`tapply()` to return data frame
Upvotes: 1
Views: 241
Reputation: 50704
Another solution with plyr package:
df$Mod <- sprintf("(%i, %.2f)", df$ModNum, df$Amnt) # prepare format
library(plyr)
ddply(df, .(ID), summarise, Mod=paste(Mod, collapse=", "))
# ID Mod
# 1 ID1 (1, 2.00), (2, 3.00), (3, 2.00)
# 2 ID2 (1, 5.00), (2, 1.00)
# 3 ID3 (0, 5.00)
Upvotes: 1
Reputation: 21502
I would recommend organizing the output a little differently, so that your dataframe called Mod
has three elements named ID1 , ID2, ID3
, and each of those elements is a matrix with two columns. So ID2
would be
1 5.00
Edit: using
2 1.00split
as in the other answer is much cleaner.
then,
Rgames> df<-as.list(1:length(unique(ID)))
Rgames> names(df)<-unique(ID)
Rgames> df$ID1<-cbind(ModNum[ID=="ID1"],Amnt[ID=="ID1"])
Rgames> df
$ID1
[,1] [,2]
[1,] 1 2
[2,] 2 3
[3,] 3 2
$ID2
[1] 2
$ID3
[1] 3
And of course you could do a loop or lapply
to fill in all the ID slots.
Upvotes: 0
Reputation: 89057
Here is a solution using split()
.
> ID.split <- split(df[-1], df$ID)
> ID.split
$ID1
ModNum Amnt
1 1 2
2 2 3
3 3 2
$ID2
ModNum Amnt
4 1 5
5 2 1
$ID3
ModNum Amnt
6 0 5
>
> flat.list <- lapply(ID.split, function(x)as.vector(t(x)))
> df <- data.frame(ID = names(flat.list))
> df$Mod <- flat.list
> df
ID Mod
1 ID1 1, 2, 2, 3, 3, 2
2 ID2 1, 5, 2, 1
3 ID3 0, 5
It is my opinion that the output of split()
(what I called ID.split
above) is a much better data.structure to work with from a programming point of view than the final output you asked for.
Upvotes: 1