Reputation: 55
I am new in R programming language. I have a data set that has 2 columns(ID and Num) like this:
ID Num
3 8
3 12
4 15
4 18
4 24
But I want to convert it to:
ID Num
3 8 12
4 15 18 24
3 and 4 are still in column 'ID' but 8 and 12 are in one row near each other, in 'Num' column with 'ID' of 3. And also 4 is in column 'ID' and 15 18 and 24 are in one row near each other, in 'Num' column with ID of 4. Can anyone help me convert original data set to this new type. I searched a lot but I couldn't find R code of this problem anywhere.
Upvotes: 3
Views: 1203
Reputation: 61164
You can also use aggregate
> aggregate(DF$Num~DF$ID, FUN=paste, sep=" ")
DF$ID DF$Num
1 3 8, 12
2 4 15, 18, 24
Alternatively, you can use data =
parameter of aggregate
to get the column names not have the DF$
:
aggregate(data=DF, Num~ID, FUN=paste, sep=" ")
# ID Num
# 1 3 8, 12
# 2 4 15, 18, 24
Upvotes: 3
Reputation: 118799
Alternatively if you want the Num
column to be a list, you can do something this:
using by
:
do.call(rbind, by(df, df$ID, FUN=function(x)
data.frame(ID=x$ID[1], Num = I(list(x$Num)))))
# ID Num
# 3 3 8, 12
# 4 4 15, 18, 24
Or using split
+ lapply
:
do.call(rbind, lapply(split(df, df$ID), function(x)
data.frame(ID=x$ID[1], Num=I(list(x$Num)))))
Or using plyr
package:
require(plyr)
ddply(df, .(ID), function(x) data.frame(ID = x$ID[1], Num = I(list(x$Num))))
Or using data.table
package:
require(data.table)
dt <- as.data.table(df)
dt[, list(Num = list(Num)),by = ID]
Upvotes: 1
Reputation: 9157
The problem with the data format you want to get is that it requires a varying number of columns. Of course, if you have at most three values for each id
, you could just add three columns. But that will get rather complicated and hard to handle for id
s with say 100 values.
On way around is to use lists. Here, The number of columns is not fixed anymore.
The way to archive what you want with lists is not difficult:
d <- data.frame(id=c(3,3,4,4,4), num=c(8,12,15,18,24)) # Just your sample data
l <- with(d, tapply(num, id, c))
What happens above? with
just spares me the need for typing d$num
and d$id
and does nothing for the actual solution. The key lies in tapply
. Here, we group all values of num
by id
and call c
separately for those groups. tapply
then collects the outputs and returns a datastructure that fits the resulting values best - in our case that is a list. The result:
> l
$`3`
[1] 8 12
$`4`
[1] 15 18 24
You can query only parts using
> l[[1]] # The first element in the list
[1] 8 12
> l[['3']] # The element with key (id) `3`
[1] 8 12
One more way. If you rather want the numbers pasted together as a string in a single column this of course is possible as well:
> with(d, tapply(num, id, paste, collapse=' '))
3 4
"8 12" "15 18 24"
Upvotes: 1