Reputation:
This is regarding latest tidyr
release. I am trying pivot_wider
& pivot_longer
function from library(tidyr)
(Update 1.0.0)
I was trying to obtain normal iris dataset when I run below but instead I get nested sort of 3X5 dimension tibble, not sure whats happening (I read https://tidyr.tidyverse.org/articles/pivot.html) but still not sure how to avoid this
library(tidyr)
iris %>% pivot_longer(-Species,values_to = "count") %>%
pivot_wider(names_from = name, values_from = count)
Expected Output: Normal Iris dataset (150 X 5 dimension)
Edit: I read below that if I wrap around unnest() I get expected output. I am not able to understand why to unnest it when we did not nest it anywhere. Any basic help would be appreciated. Want to understand the concept of what went wrong.
Upvotes: 6
Views: 6102
Reputation: 47340
pivot_wider()
, unlike nest()
, allows us to aggregate multiple values when the rows are not given a unique identifier.
The default is to use list
to aggregate and to be verbose about it.
To expand the output we could use unnest()
as already suggested but it's more idiomatic to use unchop()
because we're not trying to expand a horizontal dimensionality in the nested values.
So to sum it all up to get back your initial data (except it'll be a tibble) you can do:
library(tidyr)
iris %>%
pivot_longer(-Species,values_to = "count") %>%
print() %>%
pivot_wider(names_from = name,
values_from = count,
values_fn = list(count=list)) %>%
print() %>%
unchop(everything()) %>%
print() %>%
all.equal(iris)
#> # A tibble: 600 x 3
#> Species name count
#> <fct> <chr> <dbl>
#> 1 setosa Sepal.Length 5.1
#> 2 setosa Sepal.Width 3.5
#> 3 setosa Petal.Length 1.4
#> 4 setosa Petal.Width 0.2
#> 5 setosa Sepal.Length 4.9
#> 6 setosa Sepal.Width 3
#> 7 setosa Petal.Length 1.4
#> 8 setosa Petal.Width 0.2
#> 9 setosa Sepal.Length 4.7
#> 10 setosa Sepal.Width 3.2
#> # ... with 590 more rows
#> # A tibble: 3 x 5
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <fct> <list<dbl>> <list<dbl>> <list<dbl>> <list<dbl>>
#> 1 setosa [50] [50] [50] [50]
#> 2 versicolor [50] [50] [50] [50]
#> 3 virginica [50] [50] [50] [50]
#> # A tibble: 150 x 5
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 5.1 3.5 1.4 0.2
#> 2 setosa 4.9 3 1.4 0.2
#> 3 setosa 4.7 3.2 1.3 0.2
#> 4 setosa 4.6 3.1 1.5 0.2
#> 5 setosa 5 3.6 1.4 0.2
#> 6 setosa 5.4 3.9 1.7 0.4
#> 7 setosa 4.6 3.4 1.4 0.3
#> 8 setosa 5 3.4 1.5 0.2
#> 9 setosa 4.4 2.9 1.4 0.2
#> 10 setosa 4.9 3.1 1.5 0.1
#> # ... with 140 more rows
#> [1] TRUE
Created on 2019-09-15 by the reprex package (v0.3.0)
Upvotes: 4
Reputation:
As I learnt from Akrun & other helpful friends & post (Not a bug or anything)
spread(., name, count) throws an error because we have multiple rows for each species x name. pivot_wider does a better job by providing a list-columns instead. If we add unique ID to each row then it works fine.
library(tidyverse)
iris %>%
rowid_to_column() %>%
pivot_longer(-c(rowid, Species), values_to = "count") %>%
pivot_wider(names_from = name, values_from = count) %>%
select(-rowid)
Upvotes: 7