Reputation: 2022
The tibble below has a list-column property
that contains some missing values:
library(tidyverse)
tbl = tibble(type = c('scale', 'range', 'min', 'max'),
property = list(list(lttr = letters, mth = month.name), NA) %>%
rep(., 2))
# A tibble: 4 x 2
type property
<chr> <list>
1 scale <list [2]>
2 range <lgl [1]>
3 min <list [2]>
4 max <lgl [1]>
I would like to unnest this column and then spread the result into a wide format with three columns - type
, lttr
and mth
:
tbl = tibble(type = c('scale', 'range', 'min', 'max'),
property = list(list(lttr = letters, mth = month.name), NA) %>%
rep(., 2)) %>%
mutate(property = map_if(property, is_list, enframe)) %>%
unnest(property) %>%
spread(name, value)
However, the unnest
call throws the following error:
Error: Each column must either be a list of vectors or a list of data frames [property]
I came across a similar issue on Git that asks unnest
to support NULL
values but makes no mention of NAs
. There don't appear to be any arguments in the function documentation that pertain to missings either, but I could be wrong.
The pipeline works as expected if the NAs
are filtered out:
tbl = tibble(type = c('scale', 'range', 'min', 'max'),
property = list(list(lttr = letters, mth = month.name), NA) %>%
rep(., 2)) %>%
mutate(property = map_if(property, is_list, enframe)) %>%
filter(!is.na(property)) %>% # drop_na() and na_omit not working not sure why
unnest(property) %>%
spread(name, value)
tbl
# A tibble: 2 x 3
type lttr mth
<chr> <list> <list>
1 min <chr [26]> <chr [12]>
2 scale <chr [26]> <chr [12]>
Upvotes: 1
Views: 1763
Reputation: 887501
An option would be to convert everything into tibble
so that while unnest
ing the structure would be the same across rather than manually subsetting
library(tidyverse)
tbl %>%
mutate(property = map(property, ~ if(!is.list(.x))
enframe(list(nm1 = .x)) else enframe(.x))) %>%
unnest %>%
spread(name, value) %>%
select(type, lttr, mth)
# A tibble: 4 x 3
# type lttr mth
# <chr> <list> <list>
#1 max <NULL> <NULL>
#2 min <chr [26]> <chr [12]>
#3 range <NULL> <NULL>
#4 scale <chr [26]> <chr [12]>
The issue in the OP's example is that difference in structure for the NA
rows when compared to the other rows. When we filter
them out, the structure is same across and the issue got resolved
We can also check with another example where the number of list
elements are greater than 2.
tbl1 <- tibble(type = c('scale', 'range', 'min', 'max'),
property = list(list(lttr = letters, mth = month.name,
val1 = rnorm(12), val2 = runif(12)), NA) %>%
rep(., 2))
tbl1 %>%
mutate(property = map(property, ~ if(!is.list(.x)) enframe(list(nm1 = .x))
else enframe(.x))) %>%
unnest %>%
spread(name, value) %>%
select(-nm1)
# A tibble: 4 x 5
# type lttr mth val1 val2
# <chr> <list> <list> <list> <list>
#1 max <NULL> <NULL> <NULL> <NULL>
#2 min <chr [26]> <chr [12]> <dbl [12]> <dbl [12]>
#3 range <NULL> <NULL> <NULL> <NULL>
#4 scale <chr [26]> <chr [12]> <dbl [12]> <dbl [12]>
This can be extended to arbitrary number of elements
Upvotes: 1
Reputation: 389135
How about unnest
ing the tbl
, group_by
type
and then create new columns with summarise
?
library(dplyr)
library(tidyr)
tbl %>%
unnest() %>%
group_by(type) %>%
summarise(lttr = property[1L],
mth = property[2L])
# type lttr mth
# <chr> <list> <list>
#1 max <NULL> <NULL>
#2 min <chr [26]> <chr [12]>
#3 range <NULL> <NULL>
#4 scale <chr [26]> <chr [12]>
Upvotes: 1