Reputation: 7317
I need to turn this
id | amount | day
---------------------
A | 10 | 0
A | 54 | 8
A | 23 | 18
A | 43 | 28
A | 87 | 51
B | 34 | 0
B | 76 | 1
B | 12 | 7
into this
id | a1 | a2 | a3 | a4 | a5 | d1 | d2 | d3 | d4 | d5
--------------------------------------------------------
A | 10 | 54 | 23 | 43 | 87 | 0 | 8 | 18 | 28 | 51
B | 34 | 76 | 12 | 0 | 0 | 0 | 1 | 7 | 0 | 0
ie. transpose/cast the rows of the df to an unknown number of columns by id, putting zeros where there would be empty values because of the inequal length.
I've experimented with
df <- data.frame(id=c('A','A','A','A','A','B','B','B'), amount=c(10,54,23,43,87,34,76,12), day=c(0,8,18,28,51,0,1,7))
library(reshape2)
x <- dcast(df, id ~ day, mean, value = 'amount')
but it's not quite right. How do I do it?
Upvotes: 3
Views: 1278
Reputation: 121568
I have to create a new variable dd
df$dd <-unlist(by(df$id,df$id, FUN= function(x)seq(1,length(x))))
id amount day dd
1 A 10 0 1
2 A 54 8 2
3 A 23 18 3
4 A 43 28 4
5 A 87 51 5
6 B 34 0 1
7 B 76 1 2
8 B 12 7 3
mm <- melt(df,id.vars=c('id','dd'),measure.vars=c('amount','day'))
dcast(mm,id~variable+dd,fun.aggregate=mean)
id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1 A 10 54 23 43 87 0 8 18 28 51
2 B 34 76 12 NaN NaN 0 1 7 NaN NaN
EDIT To get a nice ouput with 0 , I replace mean with a custom function
dcast(mm,id~variable+dd,fun.aggregate=
function(x) ifelse(is.nan(mean(x)),0,mean(x)))
id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1 A 10 54 23 43 87 0 8 18 28 51
2 B 34 76 12 0 0 0 1 7 0 0
Upvotes: 3
Reputation: 193527
Use base R reshape()
after introducing a "time" variable:
df$time <- ave(as.numeric(as.character(df$id)), df$id, FUN = seq_along)
df
# id amount day time
# 1 A 10 0 1
# 2 A 54 8 2
# 3 A 23 18 3
# 4 A 43 28 4
# 5 A 87 51 5
# 6 B 34 0 1
# 7 B 76 1 2
# 8 B 12 7 3
reshape(df, direction = "wide", idvar="id", timevar="time")
# id amount.1 day.1 amount.2 day.2 amount.3 day.3 amount.4 day.4 amount.5 day.5
# 1 A 10 0 54 8 23 18 43 28 87 51
# 6 B 34 0 76 1 12 7 NA NA NA NA
Optional steps:
Reorganize column order:
df2 <- df2[c("id",
grep("amount", names(df2), value=TRUE),
grep("day", names(df2), value = TRUE))]
Replace NA
with 0
:
df2[is.na(df2)] <- 0
df2
# id amount.1 amount.2 amount.3 amount.4 amount.5 day.1 day.2 day.3 day.4 day.5
# 1 A 10 54 23 43 87 0 8 18 28 51
# 6 B 34 76 12 0 0 0 1 7 0 0
Upvotes: 4