user51462
user51462

Reputation: 2022

R - Is it possible to unnest a list-column that contains missing (NA) values?

The tibble below has a list-column property that contains some missing values:

library(tidyverse)

tbl = tibble(type = c('scale', 'range', 'min', 'max'), 
         property = list(list(lttr = letters, mth = month.name), NA) %>% 
           rep(., 2))
# A tibble: 4 x 2
  type  property  
  <chr> <list>    
1 scale <list [2]>
2 range <lgl [1]> 
3 min   <list [2]>
4 max   <lgl [1]> 

I would like to unnest this column and then spread the result into a wide format with three columns - type, lttr and mth:

tbl = tibble(type = c('scale', 'range', 'min', 'max'), 
             property = list(list(lttr = letters, mth = month.name), NA) %>% 
               rep(., 2)) %>% 
  mutate(property = map_if(property, is_list, enframe)) %>% 
  unnest(property) %>%
  spread(name, value)

However, the unnest call throws the following error:

Error: Each column must either be a list of vectors or a list of data frames [property]

I came across a similar issue on Git that asks unnest to support NULL values but makes no mention of NAs. There don't appear to be any arguments in the function documentation that pertain to missings either, but I could be wrong.

The pipeline works as expected if the NAs are filtered out:

tbl = tibble(type = c('scale', 'range', 'min', 'max'), 
             property = list(list(lttr = letters, mth = month.name), NA) %>% 
               rep(., 2)) %>% 
  mutate(property = map_if(property, is_list, enframe)) %>% 
  filter(!is.na(property)) %>% # drop_na() and na_omit not working not sure why
  unnest(property) %>%
  spread(name, value)

tbl
# A tibble: 2 x 3
  type  lttr       mth       
  <chr> <list>     <list>    
1 min   <chr [26]> <chr [12]>
2 scale <chr [26]> <chr [12]>

Upvotes: 1

Views: 1763

Answers (2)

akrun
akrun

Reputation: 887501

An option would be to convert everything into tibble so that while unnesting the structure would be the same across rather than manually subsetting

library(tidyverse)
tbl %>%
    mutate(property = map(property, ~ if(!is.list(.x))
        enframe(list(nm1 = .x)) else enframe(.x))) %>%
    unnest %>% 
    spread(name, value) %>%
    select(type, lttr, mth)
# A tibble: 4 x 3
#  type  lttr       mth       
#  <chr> <list>     <list>    
#1 max   <NULL>     <NULL>    
#2 min   <chr [26]> <chr [12]>
#3 range <NULL>     <NULL>    
#4 scale <chr [26]> <chr [12]>

The issue in the OP's example is that difference in structure for the NA rows when compared to the other rows. When we filter them out, the structure is same across and the issue got resolved


We can also check with another example where the number of list elements are greater than 2.

tbl1 <- tibble(type = c('scale', 'range', 'min', 'max'), 
      property = list(list(lttr = letters, mth = month.name, 
       val1 = rnorm(12), val2 = runif(12)), NA) %>% 
        rep(., 2))

tbl1 %>% 
   mutate(property = map(property, ~ if(!is.list(.x)) enframe(list(nm1 = .x)) 
          else enframe(.x))) %>% 
   unnest %>%
   spread(name, value) %>%
   select(-nm1)
# A tibble: 4 x 5
#  type  lttr       mth        val1       val2      
#  <chr> <list>     <list>     <list>     <list>    
#1 max   <NULL>     <NULL>     <NULL>     <NULL>    
#2 min   <chr [26]> <chr [12]> <dbl [12]> <dbl [12]>
#3 range <NULL>     <NULL>     <NULL>     <NULL>    
#4 scale <chr [26]> <chr [12]> <dbl [12]> <dbl [12]>

This can be extended to arbitrary number of elements

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389135

How about unnesting the tbl, group_by type and then create new columns with summarise ?

library(dplyr)
library(tidyr)

tbl %>%
  unnest() %>%
  group_by(type) %>%
  summarise(lttr = property[1L], 
            mth = property[2L])

#  type  lttr       mth       
#  <chr> <list>     <list>    
#1 max   <NULL>     <NULL>    
#2 min   <chr [26]> <chr [12]>
#3 range <NULL>     <NULL>    
#4 scale <chr [26]> <chr [12]>

Upvotes: 1

Related Questions