Dyem
Dyem

Reputation: 9

R Filling missing values with NA for a data frame

I am currently trying to create a data-frame with the following lists

location <- list("USA","Singapore","UK")
organization <- list("Microsoft","University of London","Boeing","Apple")
person <- list()
date <- list("1989","2001","2018")
Jobs <- list("CEO","Chairman","VP of sales","General Manager","Director")

When I try and create a data-frame I get the (obvious) error that the lengths of the lists are not equal. I want to find a way to either make the lists the same length, or fill the missing data-frame entries with "NA". After doing some searching I have not been able to find a solution

Upvotes: 0

Views: 3305

Answers (2)

s_baldur
s_baldur

Reputation: 33488

You could do:

data.frame(sapply(dyem_list, "length<-", max(lengths(dyem_list))))

   location         organization person date            Jobs
1       USA            Microsoft   NULL 1989             CEO
2 Singapore University of London   NULL 2001        Chairman
3        UK               Boeing   NULL 2018     VP of sales
4      NULL                Apple   NULL NULL General Manager
5      NULL                 NULL   NULL NULL        Director

Where dyem_list is the following:

dyem_list <- list(
  location = list("USA","Singapore","UK"),
  organization = list("Microsoft","University of London","Boeing","Apple"),
  person = list(),
  date = list("1989","2001","2018"),
  Jobs = list("CEO","Chairman","VP of sales","General Manager","Director")
)

Upvotes: 1

camille
camille

Reputation: 16842

Here are purrr (part of tidyverse) and base R solutions, assuming you just want to fill remaining values in each list with NA. I'm taking the maximum length of any list as len, then for each list doing rep(NA) for the difference between the length of that list and the maximum length of any list.

library(tidyverse)

location <- list("USA","Singapore","UK")
organization <- list("Microsoft","University of London","Boeing","Apple")
person <- list()
date <- list("1989","2001","2018")
Jobs <- list("CEO","Chairman","VP of sales","General Manager","Director")

all_lists <- list(location, organization, person, date, Jobs)
len <- max(lengths(all_lists))

With purrr::map_dfc, you can map over the list of lists, tack on NAs as needed, convert to character vector, then get a data frame of all those vectors cbinded in one piped call:

map_dfc(all_lists, function(l) {
  c(l, rep(NA, len - length(l))) %>%
    as.character()
})
#> # A tibble: 5 x 5
#>   V1        V2                   V3    V4    V5             
#>   <chr>     <chr>                <chr> <chr> <chr>          
#> 1 USA       Microsoft            NA    1989  CEO            
#> 2 Singapore University of London NA    2001  Chairman       
#> 3 UK        Boeing               NA    2018  VP of sales    
#> 4 NA        Apple                NA    NA    General Manager
#> 5 NA        NA                   NA    NA    Director

In base R, you can lapply the same function across the list of lists, then use Reduce to cbind the resulting lists and convert it to a data frame. Takes two steps instead of purrr's one:

cols <- lapply(all_lists, function(l) c(l, rep(NA, len - length(l))))
as.data.frame(Reduce(cbind, cols, init = NULL))
#>          V1                   V2 V3   V4              V5
#> 1       USA            Microsoft NA 1989             CEO
#> 2 Singapore University of London NA 2001        Chairman
#> 3        UK               Boeing NA 2018     VP of sales
#> 4        NA                Apple NA   NA General Manager
#> 5        NA                   NA NA   NA        Director

For both of these, you can now set the names however you like.

Upvotes: 1

Related Questions