user12059497
user12059497

Reputation:

Pivot wider produces nested object

This is regarding latest tidyr release. I am trying pivot_wider & pivot_longer function from library(tidyr) (Update 1.0.0)

I was trying to obtain normal iris dataset when I run below but instead I get nested sort of 3X5 dimension tibble, not sure whats happening (I read https://tidyr.tidyverse.org/articles/pivot.html) but still not sure how to avoid this

library(tidyr)
iris %>% pivot_longer(-Species,values_to = "count") %>% 
pivot_wider(names_from = name, values_from = count)

Expected Output: Normal Iris dataset (150 X 5 dimension)

Edit: I read below that if I wrap around unnest() I get expected output. I am not able to understand why to unnest it when we did not nest it anywhere. Any basic help would be appreciated. Want to understand the concept of what went wrong.

Upvotes: 6

Views: 6102

Answers (2)

moodymudskipper
moodymudskipper

Reputation: 47340

pivot_wider(), unlike nest(), allows us to aggregate multiple values when the rows are not given a unique identifier.

The default is to use list to aggregate and to be verbose about it.

To expand the output we could use unnest() as already suggested but it's more idiomatic to use unchop() because we're not trying to expand a horizontal dimensionality in the nested values.

So to sum it all up to get back your initial data (except it'll be a tibble) you can do:

library(tidyr)
iris %>% 
  pivot_longer(-Species,values_to = "count") %>% 
  print() %>%
  pivot_wider(names_from = name, 
              values_from = count, 
              values_fn = list(count=list)) %>%
  print() %>%
  unchop(everything()) %>%
  print() %>%
  all.equal(iris)
#> # A tibble: 600 x 3
#>    Species name         count
#>    <fct>   <chr>        <dbl>
#>  1 setosa  Sepal.Length   5.1
#>  2 setosa  Sepal.Width    3.5
#>  3 setosa  Petal.Length   1.4
#>  4 setosa  Petal.Width    0.2
#>  5 setosa  Sepal.Length   4.9
#>  6 setosa  Sepal.Width    3  
#>  7 setosa  Petal.Length   1.4
#>  8 setosa  Petal.Width    0.2
#>  9 setosa  Sepal.Length   4.7
#> 10 setosa  Sepal.Width    3.2
#> # ... with 590 more rows
#> # A tibble: 3 x 5
#>   Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   <fct>       <list<dbl>> <list<dbl>>  <list<dbl>> <list<dbl>>
#> 1 setosa             [50]        [50]         [50]        [50]
#> 2 versicolor         [50]        [50]         [50]        [50]
#> 3 virginica          [50]        [50]         [50]        [50]
#> # A tibble: 150 x 5
#>    Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>    <fct>          <dbl>       <dbl>        <dbl>       <dbl>
#>  1 setosa           5.1         3.5          1.4         0.2
#>  2 setosa           4.9         3            1.4         0.2
#>  3 setosa           4.7         3.2          1.3         0.2
#>  4 setosa           4.6         3.1          1.5         0.2
#>  5 setosa           5           3.6          1.4         0.2
#>  6 setosa           5.4         3.9          1.7         0.4
#>  7 setosa           4.6         3.4          1.4         0.3
#>  8 setosa           5           3.4          1.5         0.2
#>  9 setosa           4.4         2.9          1.4         0.2
#> 10 setosa           4.9         3.1          1.5         0.1
#> # ... with 140 more rows
#> [1] TRUE

Created on 2019-09-15 by the reprex package (v0.3.0)

Upvotes: 4

user12059497
user12059497

Reputation:

As I learnt from Akrun & other helpful friends & post (Not a bug or anything)

spread(., name, count) throws an error because we have multiple rows for each species x name. pivot_wider does a better job by providing a list-columns instead. If we add unique ID to each row then it works fine.

library(tidyverse)

iris %>%
  rowid_to_column() %>% 
  pivot_longer(-c(rowid, Species), values_to = "count") %>%
  pivot_wider(names_from = name, values_from = count) %>% 
  select(-rowid)

Upvotes: 7

Related Questions