SteveS
SteveS

Reputation: 4040

How to extract the list names and values to a dataframe

I am using Kaggles https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/data

json train file to analyse the features and data and apply another algorithms to check if I can boost the accuracy.

For example, I have a column: features:

Sample:

    l <- structure(list(`4` = c("Dining Room", "Pre-War", "Laundry in Building", 
"Dishwasher", "Hardwood Floors", "Dogs Allowed", "Cats Allowed"
), `6` = c("Doorman", "Elevator", "Laundry in Building", "Dishwasher", 
"Hardwood Floors", "No Fee"), `9` = c("Doorman", "Elevator", 
"Laundry in Building", "Laundry in Unit", "Dishwasher", "Hardwood Floors"
), `10` = list(), `15` = c("Doorman", "Elevator", "Fitness Center", 
"Laundry in Building")), .Names = c("4", "6", "9", "10", "15"
))

I want to build a dataframe that looks like this:

name     nested list
4        <list = list(c("Dining Room", "Pre-War", "Laundry in Building", 
"Dishwasher", "Hardwood Floors", "Dogs Allowed", "Cats Allowed"))>
6        <list = list(c("Doorman", "Elevator", "Laundry in Building", "Dishwasher", "Hardwood Floors", "No Fee"))>
9        <list = list(c("Doorman", "Elevator", 
"Laundry in Building", "Laundry in Unit", "Dishwasher", "Hardwood Floors"))>  
10       <list = list(c())>
15       <list = list(c("Doorman", "Elevator", "Fitness Center", 
"Laundry in Building")))>

Please advise how to do this.

I am a bit confused how to convert it.

My final goal is to build a dataframe with all these features unioned and each 4, 6, 10, 15 ... will have it's own 1's and 0's if they have these features, one hot encoding of them.

Please advise.

Upvotes: 0

Views: 95

Answers (1)

Sotos
Sotos

Reputation: 51592

One way is to use data.table::rbindlist() function which has an argument of fill = TRUE. This allows you to bind data frames with different number of columns. The trick however in your case is to get the empty data frame to appear in there as well. To achieve that we add an if statement which creates an NA data frame for empty list elements, i.e.

library(data.table)
rbindlist(lapply(l, function(i) {d <- as.data.frame(t(i)); 
                                if(!ncol(d)){d <- data.frame(V1 = NA)}; d}), fill = TRUE)

which gives,

            V1       V2                  V3                  V4              V5              V6           V7 
1: Dining Room  Pre-War Laundry in Building          Dishwasher Hardwood Floors    Dogs Allowed Cats Allowed 
2:     Doorman Elevator Laundry in Building          Dishwasher Hardwood Floors          No Fee         <NA> 
3:     Doorman Elevator Laundry in Building     Laundry in Unit      Dishwasher Hardwood Floors         <NA> 
4:        <NA>     <NA>                <NA>                <NA>            <NA>            <NA>         <NA> 
5:     Doorman Elevator      Fitness Center Laundry in Building            <NA>            <NA>         <NA> 

Upvotes: 1

Related Questions