spreading data in R - allowing multiple values per cell

Question

With these data

d <- data.frame(time=1:5, side=c("r","r","r","l","l"), val = c(1,2,1,2,1))
d

  time side val
1    1    r   1
2    2    r   2
3    3    r   1
4    4    l   2
5    5    l   1

We can spread to a tidy dataframe like this:

library(tidyverse)

d %>% spread(side,val)

Which gives:

  time  l  r
1    1 NA  1
2    2 NA  2
3    3 NA  1
4    4  2 NA
5    5  1 NA

But say we have more than one val for a given time/side. For example:

d <- data.frame(time=c(1:5,5), side=c("r","r","r","l","l","l"), val = c(1,2,1,2,1,2))

 time side val
1    1    r   1
2    2    r   2
3    3    r   1
4    4    l   2
5    5    l   1
6    5    l   2

Now this won't work because of duplicated values:

d %>% spread(side,val)

Error: Duplicate identifiers for rows (5, 6)

Is there an efficient way to force this behavior (or alternative). The output would be e.g.

  time  l      r
1    1 NA      1
2    2 NA      2
3    3 NA      1
4    4  2     NA
5    5  1, 2  NA

Adam Spannbauer · Accepted Answer

The data.table/reshape2 equivalent of tidyr::spread is dcast. It has a more complicated syntax than spread, but it's more flexible. To accomplish your task we can use the below chunk.

We use the formula to 'spread' side by time (filling with the values in the val column), provide the fill value of NA, and specify we want to list elements together when aggregation is needed per value of time.

library(data.table)

d <- data.table(time=c(1:5,5), 
                side=c("r","r","r","l","l","l"), 
                val = c(1,2,1,2,1,2))


data.table::dcast(d, time ~ side, 
                  value.var='val',
                  fill=NA,  
                  fun.aggregate=list)
#OUTPUT

#    time   l  r
# 1:    1  NA  1
# 2:    2  NA  2
# 3:    3  NA  1
# 4:    4   2 NA
# 5:    5 1,2 NA

spreading data in R - allowing multiple values per cell

Answers (1)

Related Questions