Reputation: 13
I would like to condense this dataframe which is an output from reading a JSON file to remove the NULL values. Currently when unnesting the dataframe I end up with unnecessary NA values. I tried compact in purr and I tried some is.null operations to no avail.
The dataframe output after reading the json file looks something like this:
col1 col2 col3
1 c(a,b,c) NULL NULL
2 NULL c(d,e,f) NULL
3 NULL NULL c(g,h,i)
And I want it to look like this:
col1 col2 col3
1 c(a,b,c) c(d,e,f) c(g,h,i)
So that when I unnest the dataframe I get:
col1 col2 col3
1 a d g
2 b e h
3 c f i
I know there must be a correct function for this, but I still haven't found it! Apologies for any formatting errors on this post, still new to this!
Upvotes: 1
Views: 272
Reputation: 576
If I understand your question correctly, your data looks like this:
tibble(col1 = list(c("a", "b", "c"), NULL, NULL),
col2 = list(NULL, c("d", "e", "f"), NULL),
col3 = list(NULL, NULL, c("g", "h", "i")))
# A tibble: 3 x 3
col1 col2 col3
<list> <list> <list>
1 <chr [3]> <NULL> <NULL>
2 <NULL> <chr [3]> <NULL>
3 <NULL> <NULL> <chr [3]>
That is, something like a dataframe or tibble with list columns, where some of the cells contain some (character) vector and other are NULL
and you want to reduce this to a single row with only the vector and leave the nulls out.
My solution is to use dplyr::summarise
and purrr::compact
to reduce the rows to a single row and to get rid of those pesky NULL
s respectively:
tibble(col1 = list(c("a", "b", "c"), NULL, NULL),
col2 = list(NULL, c("d", "e", "f"), NULL),
col3 = list(NULL, NULL, c("g", "h", "i"))) %>%
summarise_all(compact)
# A tibble: 1 x 3
col1 col2 col3
<list> <list> <list>
1 <chr [3]> <chr [3]> <chr [3]>
Unnesting this will give you
# A tibble: 3 x 3
col1 col2 col3
<chr> <chr> <chr>
1 a d g
2 b e h
3 c f i
Note: This only works as long as you have a single non-null in each of your column.
Upvotes: 1