Reputation: 4040
I am using Kaggles https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/data
json train file to analyse the features and data and apply another algorithms to check if I can boost the accuracy.
For example, I have a column: features:
Sample:
l <- structure(list(`4` = c("Dining Room", "Pre-War", "Laundry in Building",
"Dishwasher", "Hardwood Floors", "Dogs Allowed", "Cats Allowed"
), `6` = c("Doorman", "Elevator", "Laundry in Building", "Dishwasher",
"Hardwood Floors", "No Fee"), `9` = c("Doorman", "Elevator",
"Laundry in Building", "Laundry in Unit", "Dishwasher", "Hardwood Floors"
), `10` = list(), `15` = c("Doorman", "Elevator", "Fitness Center",
"Laundry in Building")), .Names = c("4", "6", "9", "10", "15"
))
I want to build a dataframe that looks like this:
name nested list
4 <list = list(c("Dining Room", "Pre-War", "Laundry in Building",
"Dishwasher", "Hardwood Floors", "Dogs Allowed", "Cats Allowed"))>
6 <list = list(c("Doorman", "Elevator", "Laundry in Building", "Dishwasher", "Hardwood Floors", "No Fee"))>
9 <list = list(c("Doorman", "Elevator",
"Laundry in Building", "Laundry in Unit", "Dishwasher", "Hardwood Floors"))>
10 <list = list(c())>
15 <list = list(c("Doorman", "Elevator", "Fitness Center",
"Laundry in Building")))>
Please advise how to do this.
I am a bit confused how to convert it.
My final goal is to build a dataframe with all these features unioned and each 4, 6, 10, 15 ... will have it's own 1's and 0's if they have these features, one hot encoding of them.
Please advise.
Upvotes: 0
Views: 95
Reputation: 51592
One way is to use data.table::rbindlist()
function which has an argument of fill = TRUE
. This allows you to bind data frames with different number of columns. The trick however in your case is to get the empty data frame to appear in there as well. To achieve that we add an if statement which creates an NA
data frame for empty list elements, i.e.
library(data.table)
rbindlist(lapply(l, function(i) {d <- as.data.frame(t(i));
if(!ncol(d)){d <- data.frame(V1 = NA)}; d}), fill = TRUE)
which gives,
V1 V2 V3 V4 V5 V6 V7 1: Dining Room Pre-War Laundry in Building Dishwasher Hardwood Floors Dogs Allowed Cats Allowed 2: Doorman Elevator Laundry in Building Dishwasher Hardwood Floors No Fee <NA> 3: Doorman Elevator Laundry in Building Laundry in Unit Dishwasher Hardwood Floors <NA> 4: <NA> <NA> <NA> <NA> <NA> <NA> <NA> 5: Doorman Elevator Fitness Center Laundry in Building <NA> <NA> <NA>
Upvotes: 1