jalapic
jalapic

Reputation: 14202

spreading data in R - allowing multiple values per cell

With these data

d <- data.frame(time=1:5, side=c("r","r","r","l","l"), val = c(1,2,1,2,1))
d

  time side val
1    1    r   1
2    2    r   2
3    3    r   1
4    4    l   2
5    5    l   1

We can spread to a tidy dataframe like this:

library(tidyverse)

d %>% spread(side,val)

Which gives:

  time  l  r
1    1 NA  1
2    2 NA  2
3    3 NA  1
4    4  2 NA
5    5  1 NA

But say we have more than one val for a given time/side. For example:

d <- data.frame(time=c(1:5,5), side=c("r","r","r","l","l","l"), val = c(1,2,1,2,1,2))

 time side val
1    1    r   1
2    2    r   2
3    3    r   1
4    4    l   2
5    5    l   1
6    5    l   2

Now this won't work because of duplicated values:

d %>% spread(side,val)

Error: Duplicate identifiers for rows (5, 6)

Is there an efficient way to force this behavior (or alternative). The output would be e.g.

  time  l      r
1    1 NA      1
2    2 NA      2
3    3 NA      1
4    4  2     NA
5    5  1, 2  NA

Upvotes: 0

Views: 46

Answers (1)

Adam Spannbauer
Adam Spannbauer

Reputation: 2757

The data.table/reshape2 equivalent of tidyr::spread is dcast. It has a more complicated syntax than spread, but it's more flexible. To accomplish your task we can use the below chunk.

We use the formula to 'spread' side by time (filling with the values in the val column), provide the fill value of NA, and specify we want to list elements together when aggregation is needed per value of time.

library(data.table)

d <- data.table(time=c(1:5,5), 
                side=c("r","r","r","l","l","l"), 
                val = c(1,2,1,2,1,2))


data.table::dcast(d, time ~ side, 
                  value.var='val',
                  fill=NA,  
                  fun.aggregate=list)
#OUTPUT

#    time   l  r
# 1:    1  NA  1
# 2:    2  NA  2
# 3:    3  NA  1
# 4:    4   2 NA
# 5:    5 1,2 NA

Upvotes: 1

Related Questions