Using spread with duplicate identifiers gives sparse matrix with NAs

Question

A user asked a question in github, https://github.com/tidyverse/tidyr/issues/41 and I see that Hadley identified this as a bug. However, there was no solution given. I still experience this problem, when I have duplicate identifiers on my data frame

structure(list(key = c("a", "b", "c", "d", "c"), value = c(1, 
2, 3, 2, 4)), .Names = c("key", "value"), row.names = c(NA, -5L
), class = c("tbl_df", "tbl", "data.frame"))

Now when I use the spread from dplyr, I still have a sparse matrix with NAs, because I happen to have duplicate identifiers

dftest %>% spread(key,value)
Error: Duplicate identifiers for rows (3, 5)

So I add an ID row

> dftest$id<-seq(1,5)
> dftest %>% spread(key,value)
# A tibble: 5 x 5
     id     a     b     c     d
      
1     1    1.   NA    NA    NA 
2     2   NA     2.   NA    NA 
3     3   NA    NA     3.   NA 
4     4   NA    NA    NA     2.
5     5   NA    NA     4.   NA

But the diagonal data frame is not what I want. I would like one where the top row of the output of spread reads 1,2,3,2 in row 1. Then the value in colum c will fall right underneath, in row 2. That is to say, I have no use for a diagonal matrix with NAs. Am I missing something? I ask with humility.

Using spread with duplicate identifiers gives sparse matrix with NAs

Answers (1)

Related Questions