user113156
user113156

Reputation: 7127

Setting names from list elements as a column when using bind_rows/data.frame

I am trying to process a number of lists and I am losing the names of some of the list elements.

The list looks like:

> myLists2
[[1]]
NULL

[[2]]
[[2]][[1]]
                         title                        company                     date_range                       location 
            "Founder | Co-CEO"                  "someCompany" "ene. de 2018 \023 actualidad"                       "Europe" 
                   description                 li_company_url 
          "some description 1"       "https://www.google.com" 

[[2]][[2]]
                         title                        company                     date_range                       location 
               "Another title"                 "someCompany2" "ene. de 2019 \023 actualidad"                          "USA" 
                   description                 li_company_url 
         "Another Description"        "https://www.yahoo.com" 

[[2]][[3]]
                          title                         company                      date_range                        location 
              "Another title 3"             "Another company 3" "sept. de 2018 \023 actualidad"                        "Europe" 
                    description                  li_company_url 
        "Another description 3" "https://www.stackexchange.com" 

Where if I run names(myLists2[[2]][[1]]) I get the following:

[1] "title"          "company"        "date_range"     "location"       "description"    "li_company_url"

The number of names can slightly vary over the different lists and I would like to create a new column where the names appear in a data.frame.

Running:

hh <- myLists2[[2]] %>% data.frame() %>% rownames_to_column("tag")

Gives me a nice data frame where I use the rownames_to_column() function to save the rownames, however this gives me an error when the list elements are different lengths.

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 5, 6

A solution I found to this was to use bind_rows(). Running:

myLists2[[2]] %>% bind_rows()

Gives me a tibble but I lose the names from the lists. Running:

myLists2[[2]] %>% bind_rows(.id = "myID")

Does not seem to solve the issue either since it just gives me a new column from 1 to 3.

My question is, how can I use the bind_rows() (which are not sensitive to differing column lengths) and also save the names from the lists as a column?

Data:

myLists2 <- list(NULL, list(c(title = "Founder | Co-CEO", company = "someCompany", 
                              date_range = "ene. de 2018 \023 actualidad", location = "Europe", 
                              description = "some description 1", li_company_url = "https://www.google.com"
), c(title = "Another title", company = "someCompany2", date_range = "ene. de 2019 \023 actualidad", 
     location = "USA", description = "Another Description", li_company_url = "https://www.yahoo.com"
), c(title = "Another title 3", company = "Another company 3", 
     date_range = "sept. de 2018 \023 actualidad", location = "Europe", 
     description = "Another description 3", li_company_url = "https://www.stackexchange.com"
)))

EDIT: (Adding a new list)

myNewList <- list(list(c(title = "Founder | Co-CEO", company = "some company", 
            date_range = "ene. de 2018 \023 actualidad", location = "Europe", 
            description = "some description", 
            li_company_url = "https://www.google.com"
), c(title = "some thing here", company = "some company", 
     date_range = "ene. de 2019 \023 actualidad", location = "USA", 
     description = "another description", 
     li_company_url = "https://www.yahoo.com")
), list(c(title = "CEO", company = "another company", 
           date_range = "2012 \023 actualidad", description = "some other description", 
           li_company_url = ""), c(title = "job title", 
                                   company = "company name", date_range = "ene. de 2005 \023 actualidad", 
                                   location = "Europe", description = "company description", 
                                   li_company_url = "https://www.yahoo.com"), 
         c(title = "job title 2", company = "company name", date_range = "1995 \023 actualidad", 
           description = "description", 
           li_company_url = ""), c(title = "job title", 
                                   company = "company name", date_range = "1992 \023 1995", 
                                   location = "USA", description = "soem company description", 
                                   li_company_url = ""), c(title = "company title", company = "company name", 
                                                           date_range = "1990 \023 1992", description = "Another description", 
                                                           li_company_url = "")), NULL)

These show the problems I am running into:

map(myNewList, ~data.frame(.x))
map(myNewList[1], ~data.frame(.x)) # runs okay and I keep the names
map(myNewList[2], ~data.frame(.x)) # errors
map(myNewList, ~bind_rows(.x))    # runs okay but I lsoe the names

Upvotes: 3

Views: 1322

Answers (3)

iago
iago

Reputation: 3266

Another possibility using only purrr, dplyr and tibble:

myNewList %>%
    map_if(~!is.null(.), 
           function(mylist) map(mylist, 
                                ~data.frame(.x) %>% 
                                     rownames_to_column("tag")) %>% 
                                 reduce(full_join, by = "tag"))
[[1]]
             tag                         .x.x                         .x.y
1          title             Founder | Co-CEO              some thing here
2        company                 some company                 some company
3     date_range ene. de 2018 \023 actualidad ene. de 2019 \023 actualidad
4       location                       Europe                          USA
5    description             some description          another description
6 li_company_url       https://www.google.com        https://www.yahoo.com

[[2]]
             tag                   .x.x                         .x.y               .x.x.x                   .x.y.y                  .x
1          title                    CEO                    job title          job title 2                job title       company title
2        company        another company                 company name         company name             company name        company name
3     date_range   2012 \023 actualidad ene. de 2005 \023 actualidad 1995 \023 actualidad           1992 \023 1995      1990 \023 1992
4    description some other description          company description          description soem company description Another description
5 li_company_url                               https://www.yahoo.com                                                                  
6       location                   <NA>                       Europe                 <NA>                      USA                <NA>

[[3]]
NULL

Or removing empty lists:

inter_list <- map(myNewList, function(mylist) map(mylist, ~data.frame(.x) %>% rownames_to_column("tag")))
nullw <- which(map_lgl(inter_list, ~length(.x)==0))
if(length(nullw)!=0) inter_list <- inter_list[-nullw]
map(inter_list, ~reduce(.x, full_join, by = "tag"))

[[1]]
             tag                         .x.x                         .x.y
1          title             Founder | Co-CEO              some thing here
2        company                 some company                 some company
3     date_range ene. de 2018 \023 actualidad ene. de 2019 \023 actualidad
4       location                       Europe                          USA
5    description             some description          another description
6 li_company_url       https://www.google.com        https://www.yahoo.com

[[2]]
             tag                   .x.x                         .x.y               .x.x.x                   .x.y.y                  .x
1          title                    CEO                    job title          job title 2                job title       company title
2        company        another company                 company name         company name             company name        company name
3     date_range   2012 \023 actualidad ene. de 2005 \023 actualidad 1995 \023 actualidad           1992 \023 1995      1990 \023 1992
4    description some other description          company description          description soem company description Another description
5 li_company_url                               https://www.yahoo.com                                                                  
6       location                   <NA>                       Europe                 <NA>                      USA                <NA>

Upvotes: 0

akrun
akrun

Reputation: 887431

We could use map_if with data.table::transpose after doing the bind_rows

library(purrr)
library(dplyr)
library(tibble)
library(data.table)
map_if(myNewList, .p = ~ length(.) > 0,
                .f =  ~bind_rows(.x) %>% 
                         data.table::transpose(., keep.names = 'title') %>%
                         column_to_rownames('title'),
                 .else = ~ NA_character_)

-output

#[[1]]
#                                         V1                           V2
#title                      Founder | Co-CEO              some thing here
#company                        some company                 some company
#date_range     ene. de 2018 \023 actualidad ene. de 2019 \023 actualidad
#location                             Europe                          USA
#description                some description          another description
#li_company_url       https://www.google.com        https://www.yahoo.com

#[[2]]
#                                   V1                           V2                   V3                       V4
#title                             CEO                    job title          job title 2                job title
#company               another company                 company name         company name             company name
#date_range       2012 \023 actualidad ene. de 2005 \023 actualidad 1995 \023 actualidad           1992 \023 1995
#description    some other description          company description          description soem company description
#li_company_url                               https://www.yahoo.com                                              
#location                         <NA>                       Europe                 <NA>                      USA
#                                V5
#title                company title
#company               company name
#date_range          1990 \023 1992
#description    Another description
#li_company_url                    
#location                      <NA>

#[[3]]
#[1] NA

Upvotes: 1

Duck
Duck

Reputation: 39613

After trying many options I found a rustic method to obtain what you want. It uses rbind.fill() function from plyr so be careful when loading the package as dplyr has conflicts with it. The main idea (that uses a loop) transform your valus to a dataframe, then transpose to have columns and can bind by rows so that empty space can be filled with NA (that is why we used the plyr function). The pro is that in the loop you can manage the NULL elements with a conditional. Here the code with the new data you shared:

library(plyr)
#Create a list to store the results
List <- list()
#Loop index2
for(i in 1:length(myNewList))
{
  v <- length(myNewList[[i]])
  #Conditional
  if(v==0)
  {
    List[[i]] <- NA
  } else
  {
    #Check length for NULL elements
    #First transform to dataframe in a column format
    #This will make easy to join
    O1 <- lapply(myNewList[[i]],function(x) as.data.frame(t(x)))
    #Now bind all with rbind.fill to avoid issues with different number of variables you had
    O2 <- do.call(rbind.fill,O1)
    #Finally transpose to have a format similar to what you want
    O3 <- as.data.frame(t(O2))
    #Save in List
    List[[i]] <- O3
  }
}

Output:

List
[[1]]
                                         V1                           V2
title                      Founder | Co-CEO              some thing here
company                        some company                 some company
date_range     ene. de 2018 \023 actualidad ene. de 2019 \023 actualidad
location                             Europe                          USA
description                some description          another description
li_company_url       https://www.google.com        https://www.yahoo.com

[[2]]
                                   V1                           V2                   V3
title                             CEO                    job title          job title 2
company               another company                 company name         company name
date_range       2012 \023 actualidad ene. de 2005 \023 actualidad 1995 \023 actualidad
description    some other description          company description          description
li_company_url                               https://www.yahoo.com                     
location                         <NA>                       Europe                 <NA>
                                     V4                  V5
title                         job title       company title
company                    company name        company name
date_range               1992 \023 1995      1990 \023 1992
description    soem company description Another description
li_company_url                                             
location                            USA                <NA>

[[3]]
[1] NA

Upvotes: 1

Related Questions