jenswirf
jenswirf

Reputation: 7317

How to transpose/cast two variables to one line?

I need to turn this

id  |  amount |  day
---------------------
 A  |   10    |  0 
 A  |   54    |  8
 A  |   23    |  18
 A  |   43    |  28
 A  |   87    |  51
 B  |   34    |  0
 B  |   76    |  1 
 B  |   12    |  7

into this

id | a1 | a2 | a3 | a4 | a5 | d1 | d2 | d3 | d4 | d5 
--------------------------------------------------------
 A | 10 | 54 | 23 | 43 | 87 | 0  |  8 | 18 | 28 | 51
 B | 34 | 76 | 12 | 0  |  0 | 0  |  1 |  7 | 0  | 0

ie. transpose/cast the rows of the df to an unknown number of columns by id, putting zeros where there would be empty values because of the inequal length.

I've experimented with

df <- data.frame(id=c('A','A','A','A','A','B','B','B'),    amount=c(10,54,23,43,87,34,76,12), day=c(0,8,18,28,51,0,1,7))
library(reshape2)
x <- dcast(df, id ~ day, mean, value = 'amount')

but it's not quite right. How do I do it?

Upvotes: 3

Views: 1278

Answers (2)

agstudy
agstudy

Reputation: 121568

I have to create a new variable dd

df$dd <-unlist(by(df$id,df$id, FUN= function(x)seq(1,length(x))))


id amount day dd
1  A     10   0  1
2  A     54   8  2
3  A     23  18  3
4  A     43  28  4
5  A     87  51  5
6  B     34   0  1
7  B     76   1  2
8  B     12   7  3

mm <- melt(df,id.vars=c('id','dd'),measure.vars=c('amount','day'))
dcast(mm,id~variable+dd,fun.aggregate=mean)
id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1  A       10       54       23       43       87     0     8    18    28    51
2  B       34       76       12      NaN      NaN     0     1     7   NaN   NaN

EDIT To get a nice ouput with 0 , I replace mean with a custom function

dcast(mm,id~variable+dd,fun.aggregate=
                       function(x) ifelse(is.nan(mean(x)),0,mean(x)))
  id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1  A       10       54       23       43       87     0     8    18    28    51
2  B       34       76       12        0        0     0     1     7     0     0

Upvotes: 3

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

Use base R reshape() after introducing a "time" variable:

df$time <- ave(as.numeric(as.character(df$id)), df$id, FUN = seq_along)
df
#   id amount day time
# 1  A     10   0    1
# 2  A     54   8    2
# 3  A     23  18    3
# 4  A     43  28    4
# 5  A     87  51    5
# 6  B     34   0    1
# 7  B     76   1    2
# 8  B     12   7    3
reshape(df, direction = "wide", idvar="id", timevar="time")
#   id amount.1 day.1 amount.2 day.2 amount.3 day.3 amount.4 day.4 amount.5 day.5
# 1  A       10     0       54     8       23    18       43    28       87    51
# 6  B       34     0       76     1       12     7       NA    NA       NA    NA

Optional steps:

  1. Reorganize column order:

    df2 <- df2[c("id", 
                 grep("amount", names(df2), value=TRUE), 
                 grep("day", names(df2), value = TRUE))]
    
  2. Replace NA with 0:

    df2[is.na(df2)] <- 0
    df2
    #   id amount.1 amount.2 amount.3 amount.4 amount.5 day.1 day.2 day.3 day.4 day.5
    # 1  A       10       54       23       43       87     0     8    18    28    51
    # 6  B       34       76       12        0        0     0     1     7     0     0
    

Upvotes: 4

Related Questions