aruntailor
aruntailor

Reputation: 13

Flattening a dataframe with nested lists containing NULL values in R

I would like to condense this dataframe which is an output from reading a JSON file to remove the NULL values. Currently when unnesting the dataframe I end up with unnecessary NA values. I tried compact in purr and I tried some is.null operations to no avail.

The dataframe output after reading the json file looks something like this:

    col1        col2       col3
1   c(a,b,c)    NULL       NULL
2   NULL        c(d,e,f)   NULL
3   NULL        NULL       c(g,h,i)

And I want it to look like this:

    col1        col2       col3
1   c(a,b,c)    c(d,e,f)   c(g,h,i)

So that when I unnest the dataframe I get:

    col1   col2   col3
1   a      d      g
2   b      e      h
3   c      f      i

I know there must be a correct function for this, but I still haven't found it! Apologies for any formatting errors on this post, still new to this!

Upvotes: 1

Views: 272

Answers (1)

koenniem
koenniem

Reputation: 576

If I understand your question correctly, your data looks like this:

tibble(col1 = list(c("a", "b", "c"), NULL, NULL), 
       col2 = list(NULL, c("d", "e", "f"), NULL),
       col3 = list(NULL, NULL, c("g", "h", "i"))) 

# A tibble: 3 x 3
  col1      col2      col3     
  <list>    <list>    <list>   
1 <chr [3]> <NULL>    <NULL>   
2 <NULL>    <chr [3]> <NULL>   
3 <NULL>    <NULL>    <chr [3]>

That is, something like a dataframe or tibble with list columns, where some of the cells contain some (character) vector and other are NULL and you want to reduce this to a single row with only the vector and leave the nulls out.

My solution is to use dplyr::summarise and purrr::compact to reduce the rows to a single row and to get rid of those pesky NULLs respectively:

tibble(col1 = list(c("a", "b", "c"), NULL, NULL), 
       col2 = list(NULL, c("d", "e", "f"), NULL),
       col3 = list(NULL, NULL, c("g", "h", "i"))) %>% 
    summarise_all(compact) 

# A tibble: 1 x 3
  col1      col2      col3     
  <list>    <list>    <list>   
1 <chr [3]> <chr [3]> <chr [3]>

Unnesting this will give you

# A tibble: 3 x 3
  col1  col2  col3 
  <chr> <chr> <chr>
1 a     d     g    
2 b     e     h    
3 c     f     i    

Note: This only works as long as you have a single non-null in each of your column.

Upvotes: 1

Related Questions