Reputation: 1
A user asked a question in github, https://github.com/tidyverse/tidyr/issues/41 and I see that Hadley identified this as a bug. However, there was no solution given. I still experience this problem, when I have duplicate identifiers on my data frame
structure(list(key = c("a", "b", "c", "d", "c"), value = c(1,
2, 3, 2, 4)), .Names = c("key", "value"), row.names = c(NA, -5L
), class = c("tbl_df", "tbl", "data.frame"))
Now when I use the spread from dplyr, I still have a sparse matrix with NAs, because I happen to have duplicate identifiers
dftest %>% spread(key,value)
Error: Duplicate identifiers for rows (3, 5)
So I add an ID row
> dftest$id<-seq(1,5)
> dftest %>% spread(key,value)
# A tibble: 5 x 5
id a b c d
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1. NA NA NA
2 2 NA 2. NA NA
3 3 NA NA 3. NA
4 4 NA NA NA 2.
5 5 NA NA 4. NA
But the diagonal data frame is not what I want. I would like one where the top row of the output of spread reads 1,2,3,2 in row 1. Then the value in colum c will fall right underneath, in row 2. That is to say, I have no use for a diagonal matrix with NAs. Am I missing something? I ask with humility.
Upvotes: 0
Views: 392
Reputation: 578
You're so closed to getting the right output.
Using dftest
from your original input.
Method:
dftest %>% group_by(key) %>% mutate(id = 1:length(key)) %>% spread(key, value)
Output:
# A tibble: 2 x 5
id a b c d
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1. 2. 3. 2.
2 2 NA NA 4. NA
Upvotes: 2