Canovice
Canovice

Reputation: 10173

In R, prevent unlist from removing NULL values, by replacing NULL with NA in nested list

Here is 1 row of data we are fetching from a sports API that comes into us as a nested list. Our fetch_results$data is a list with a nested lists like this for each of many games, as this data is for many soccer matches. The list-of-list nesting can go 3-4 layers deep, with inner lists for scores, and time, and visitorTeam below, and more.

> dput(fetch_results$data[1])
list(list(id = 11984409L, league_id = 1326L, season_id = 15733L, 
    stage_id = 77442469L, round_id = 186274L, group_id = 225400L, 
    aggregate_id = NULL, venue_id = 7189L, referee_id = NULL, 
    localteam_id = 18716L, visitorteam_id = 18658L, winner_team_id = NULL, 
    weather_report = NULL, commentaries = FALSE, attendance = NULL, 
    pitch = NULL, details = "Match 1", neutral_venue = FALSE, 
    winning_odds_calculated = FALSE, formations = list(localteam_formation = NULL, 
        visitorteam_formation = NULL), scores = list(localteam_score = 0L, 
        visitorteam_score = 0L, localteam_pen_score = NULL, visitorteam_pen_score = NULL, 
        ht_score = NULL, ft_score = NULL, et_score = NULL, ps_score = NULL), 
    time = list(status = "NS", starting_at = list(date_time = "2021-06-11 19:00:00", 
        date = "2021-06-11", time = "19:00:00", timestamp = 1623438000L, 
        timezone = "UTC"), minute = NULL, second = NULL, added_time = NULL, 
        extra_minute = NULL, injury_time = NULL), coaches = list(
        localteam_coach_id = 455836L, visitorteam_coach_id = 784486L), 
    standings = list(localteam_position = 3L, visitorteam_position = 1L), 
    assistants = list(first_assistant_id = NULL, second_assistant_id = NULL, 
        fourth_official_id = NULL), leg = "1/1", colors = NULL, 
    deleted = FALSE, is_placeholder = FALSE, localTeam = list(
        data = list(id = 18716L, legacy_id = 213L, name = "Turkey", 
            short_code = "TUR", twitter = NULL, country_id = 404L, 
            national_team = TRUE, founded = 1923L, logo_path = "https://cdn.sportmonks.com/images//soccer/teams/28/18716.png", 
            venue_id = 9634L, current_season_id = 15733L, is_placeholder = NULL)), 
    visitorTeam = list(data = list(id = 18658L, legacy_id = 205L, 
        name = "Italy", short_code = "ITA", twitter = NULL, country_id = 251L, 
        national_team = TRUE, founded = 1898L, logo_path = "https://cdn.sportmonks.com/images//soccer/teams/2/18658.png", 
        venue_id = 7189L, current_season_id = 15733L, is_placeholder = NULL))))

To flatten into a dataframe, we use:

zed <- fetch_results$data %>%
    purrr::map(unlist) %>% 
    purrr::map(t) %>% 
    purrr::map(as_tibble) %>% 
    dplyr::bind_rows() %>%
    readr::type_convert()

One row of our dataframe output looks like this:

enter image description here

If you look closer at the list of lists, there are many objects with value NULL that are dropped in the main dataframe. The entire score list with all of its keys is dropped. According to this stackoverflow post, it looks like unlist() drops the NULL values is the culprit...

The solutions posted in that thread only address NULL values that are 1-layer nested deep, however the list above has many nested lists which you can see if you search for list( above.

What is the best way to flatten this list of lists without dropping any of the columns with NULL values? If the best way is to first replace NULLs with NAs, what is the best way to do this? Our existing code does the flattening and gets close, but doesn't keep the columns with NULLs.

Upvotes: 2

Views: 629

Answers (1)

akrun
akrun

Reputation: 887173

An option is to convert to NA before we do anything. This can be done in a recursive way with rrapply

library(rrapply)
library(purrr)
library(dplyr)
zed <-  rrapply(fetch_results$data, f = function(x) 
               replace(x, is.null(x), NA)) %>% 
        map(unlist) %>%
        map(t) %>%
        map(as_tibble) %>%
        bind_rows() %>% 
        type.convert(as.is = TRUE)

-output

# A tibble: 1 x 75
        id league_id season_id stage_id round_id group_id aggregate_id venue_id referee_id localteam_id visitorteam_id winner_team_id weather_report
     <int>     <int>     <int>    <int>    <int>    <int> <lgl>           <int> <lgl>             <int>          <int> <lgl>          <lgl>         
1 11984409      1326     15733 77442469   186274   225400 NA               7189 NA                18716          18658 NA             NA            
# … with 62 more variables: commentaries <lgl>, attendance <lgl>, pitch <lgl>, details <chr>, neutral_venue <lgl>, winning_odds_calculated <lgl>,
#   formations.localteam_formation <lgl>, formations.visitorteam_formation <lgl>, scores.localteam_score <int>, scores.visitorteam_score <int>,
#   scores.localteam_pen_score <lgl>, scores.visitorteam_pen_score <lgl>, scores.ht_score <lgl>, scores.ft_score <lgl>, scores.et_score <lgl>,
#   scores.ps_score <lgl>, time.status <chr>, time.starting_at.date_time <chr>, time.starting_at.date <chr>, time.starting_at.time <chr>,
#   time.starting_at.timestamp <int>, time.starting_at.timezone <chr>, time.minute <lgl>, time.second <lgl>, time.added_time <lgl>, time.extra_minute <lgl>,
#   time.injury_time <lgl>, coaches.localteam_coach_id <int>, coaches.visitorteam_coach_id <int>, standings.localteam_position <int>,
#   standings.visitorteam_position <int>, assistants.first_assistant_id <lgl>, assistants.second_assistant_id <lgl>, assistants.fourth_official_id <lgl>,
#   leg <chr>, colors <lgl>, deleted <lgl>, is_placeholder <lgl>, localTeam.data.id <int>, localTeam.data.legacy_id <int>, localTeam.data.name <chr>,
#   localTeam.data.short_code <chr>, localTeam.data.twitter <lgl>, localTeam.data.country_id <int>, localTeam.data.national_team <lgl>,
#   localTeam.data.founded <int>, localTeam.data.logo_path <chr>, localTeam.data.venue_id <int>, localTeam.data.current_season_id <int>,
#   localTeam.data.is_placeholder <lgl>, visitorTeam.data.id <int>, visitorTeam.data.legacy_id <int>, visitorTeam.data.name <chr>,
#   visitorTeam.data.short_code <chr>, visitorTeam.data.twitter <lgl>, visitorTeam.data.country_id <int>, visitorTeam.data.national_team <lgl>,
#   visitorTeam.data.founded <int>, visitorTeam.data.logo_path <chr>, visitorTeam.data.venue_id <int>, visitorTeam.data.current_season_id <int>,
#   visitorTeam.data.is_placeholder <lgl>

Upvotes: 3

Related Questions