wan
wan

Reputation: 91

Change the data structure from long to wide and visa versa

I am having trouble with reshpe command. I'd like to change the data structure from long to wide and visa versa. I've tried spread and reshape and dcast, as well. But I failed and I do not know why. Here's my data and code

d1<-data.frame(ID=c("id1","id2","id3","id4"), year=c(2000,2001),val=c(8,7,9,8,2,3,5,6))

I'd like to place ID variable to the row as below.

  year id1 id2 id3 id4
  2000  8   9   2   5
  2001  7   8   3   6  

I've tried spread(d1, ID, val) and reshape(d1, ID, year,direction="wide")and dcast(d1, year~ID, rank) But I got errors.

Can anyone help me? Thanks.

Upvotes: 0

Views: 31

Answers (1)

hendrikvanb
hendrikvanb

Reputation: 459

There is an issue here with the way in which you have defined d1. Specifically, you have defined its contents as three vectors of different lengths. This is okay, in principle, as data.frame will effectively recycle the vectors in order to coerce them to equal length. The problem is that the way in which this recycling is done is inconsistent with the the wide format you ultimately want to get. We can show that by using pivot_wider() - one of the more recent additions to the tidyr package:

d1 <-
  data.frame(
    ID = c("id1", "id2", "id3", "id4"),
    year = c(2000, 2001),
    val = c(8, 7, 9, 8, 2, 3, 5, 6)
  )

tidyr::pivot_wider(d1, names_from = ID, values_from = val)
#> Warning: Values in `val` are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list(val = list)` to suppress this warning.
#> * Use `values_fn = list(val = length)` to identify where the duplicates arise
#> * Use `values_fn = list(val = summary_fun)` to summarise duplicates
#> # A tibble: 2 x 5
#>    year id1       id2       id3       id4      
#>   <dbl> <list>    <list>    <list>    <list>   
#> 1  2000 <dbl [2]> <NULL>    <dbl [2]> <NULL>   
#> 2  2001 <NULL>    <dbl [2]> <NULL>    <dbl [2]>

Notice that, while the command did not produce an error, the results produced certainly do not correspond to what you expected to get. Fortunately, fixing this is as straightforward as just defining the columns of the input data frame correctly. Below, I've done that for d2 and shown the results produced by pivot_wider(). I believe this corresponds to what you were expecting to get?

d2 <- data.frame(
  ID = c("id1", "id1", "id2", "id2", "id3", "id3", "id4", "id4"),
  year = c(2000, 2001, 2000, 2001, 2000, 2001, 2000, 2001),
  val = c(8, 7, 9, 8, 2, 3, 5, 6)
)

tidyr::pivot_wider(d2, names_from = ID, values_from = val)
#> # A tibble: 2 x 5
#>    year   id1   id2   id3   id4
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2000     8     9     2     5
#> 2  2001     7     8     3     6

Also, if you use d2 as input, your spread command should produce the same results as pivot_wider():

tidyr::spread(d2, ID, val)
#>   year id1 id2 id3 id4
#> 1 2000   8   9   2   5
#> 2 2001   7   8   3   6

Upvotes: 1

Related Questions