Reputation: 91
I am having trouble with reshpe command.
I'd like to change the data structure from long to wide and visa versa.
I've tried spread
and reshape
and dcast
, as well.
But I failed and I do not know why.
Here's my data and code
d1<-data.frame(ID=c("id1","id2","id3","id4"), year=c(2000,2001),val=c(8,7,9,8,2,3,5,6))
I'd like to place ID
variable to the row as below.
year id1 id2 id3 id4
2000 8 9 2 5
2001 7 8 3 6
I've tried
spread(d1, ID, val)
and
reshape(d1, ID, year,direction="wide")
and
dcast(d1, year~ID, rank)
But I got errors.
Can anyone help me? Thanks.
Upvotes: 0
Views: 31
Reputation: 459
There is an issue here with the way in which you have defined d1
. Specifically, you have defined its contents as three vectors of different lengths. This is okay, in principle, as data.frame
will effectively recycle the vectors in order to coerce them to equal length. The problem is that the way in which this recycling is done is inconsistent with the the wide format you ultimately want to get. We can show that by using pivot_wider()
- one of the more recent additions to the tidyr
package:
d1 <-
data.frame(
ID = c("id1", "id2", "id3", "id4"),
year = c(2000, 2001),
val = c(8, 7, 9, 8, 2, 3, 5, 6)
)
tidyr::pivot_wider(d1, names_from = ID, values_from = val)
#> Warning: Values in `val` are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list(val = list)` to suppress this warning.
#> * Use `values_fn = list(val = length)` to identify where the duplicates arise
#> * Use `values_fn = list(val = summary_fun)` to summarise duplicates
#> # A tibble: 2 x 5
#> year id1 id2 id3 id4
#> <dbl> <list> <list> <list> <list>
#> 1 2000 <dbl [2]> <NULL> <dbl [2]> <NULL>
#> 2 2001 <NULL> <dbl [2]> <NULL> <dbl [2]>
Notice that, while the command did not produce an error, the results produced certainly do not correspond to what you expected to get. Fortunately, fixing this is as straightforward as just defining the columns of the input data frame correctly. Below, I've done that for d2
and shown the results produced by pivot_wider()
. I believe this corresponds to what you were expecting to get?
d2 <- data.frame(
ID = c("id1", "id1", "id2", "id2", "id3", "id3", "id4", "id4"),
year = c(2000, 2001, 2000, 2001, 2000, 2001, 2000, 2001),
val = c(8, 7, 9, 8, 2, 3, 5, 6)
)
tidyr::pivot_wider(d2, names_from = ID, values_from = val)
#> # A tibble: 2 x 5
#> year id1 id2 id3 id4
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 8 9 2 5
#> 2 2001 7 8 3 6
Also, if you use d2
as input, your spread command should produce the same results as pivot_wider()
:
tidyr::spread(d2, ID, val)
#> year id1 id2 id3 id4
#> 1 2000 8 9 2 5
#> 2 2001 7 8 3 6
Upvotes: 1