Reputation: 89
Say I have a column with Id of a product and a list of data frames with characteristics about them:
bundle
1 284993459
2 1048768805
3 511310430
4 1034630958
5 1235581326
[[1]]
id value
1 35 0.2
2 1462 0.2
3 1109 0.2
4 220 0.2
5 211 0.1
[[2]]
list()
[[3]]
id name value
1 394 0.5
2 1462 0.5
[[4]]
id name value
1 926 0.3
2 1462 0.3
3 381 0.3
4 930 0.2
[[5]]
id name value
1 926 0.5
2 1462 0.5
I need to create columns with all characteristics ID and their values for each product.
bundle = data.frame(bundle = c(284993459,1048768805,511310430,1034630958,1235581326))
d2<- list(data.frame(id = c(35,1462,1109,220,211), value = c(0.2, 0.2, 0.2,0.2,0.1)),
data.frame(id = NULL, value = NULL),
data.frame(id = c(394,1462), value = c(0.5,0.5)),
data.frame(id = c(926,1462,381,930), value = c(0.3,0.3,0.3,0.2)),
data.frame(id = c(926,1462), value = c(0.5,0.5)))
bundle 35 1462 1109 220 211 394 1462
1 284993459 0.2 0.2 0.2 0.2 0.1 0 0
2 1048768805 0 0 0 0 0 0 0
3 511310430 0 0 0 0 0 0.5 0.5
Can't figure out how to do this. Had an idea to unlist this data frame list, but no good came of it, since a have more than 8000 prodict IDs:
for (i in seq(d2))
assign(paste0("df", i), d2[[i]])
If we take a different approach than I have to to join transposed characteristics data frames so the values are filled row by row.
Upvotes: 5
Views: 279
Reputation: 42544
There are two possible approaches which differ only in the sequence of operations:
rbind()
matching columns.rbind()
all dataframes in long form and reshape to wide format afterwards.Both approaches require to include bundle
somehow.
For the sake of completeness, here are different implementations of the second approach using data.table
.
library(data.table)
library(magrittr)
d2 %>%
# bind row-wise into large data.table, create id column
rbindlist(idcol = "bid") %>%
# right join to append bundle column
setDT(bundle)[, bid := .I][., on = "bid"] %>%
# reshape from long to wide format
dcast(., bundle ~ id, fill = 0)
bundle 35 211 220 381 394 926 930 1109 1462 1: 284993459 0.2 0.1 0.2 0.0 0.0 0.0 0.0 0.2 0.2 2: 511310430 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 3: 1034630958 0.0 0.0 0.0 0.3 0.0 0.3 0.2 0.0 0.3 4: 1235581326 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5
Here, piping is used just to visualize the sequence of function calls. With data.table
's chaining the statement becomes more concise:
library(data.table) # library(magrittr) not required
setDT(bundle)[, bid := .I][
rbindlist(d2, id = "bid"), on = "bid"][, dcast(.SD, bundle ~ id, fill = 0)]
or
library(data.table) # library(magrittr) not required
dcast(setDT(bundle)[, bid := .I][
rbindlist(d2, id = "bid"), on = "bid"], bundle ~ id, fill = 0)
Another variant is to rename the list elements before calling rbindlist()
which will take the names for creating the idcol
:
library(data.table)
library(magrittr)
d2 %>%
# rename list elements
setNames(bundle$bundle) %>%
# bind row-wise into large data.table, create id column from element names
rbindlist(idcol = "bundle") %>%
# convert bundle from character to factor to maintain original order
.[, bundle := forcats::fct_inorder(bundle)] %>%
# reshape from long to wide format
dcast(., bundle ~ id, fill = 0)
bundle 35 211 220 381 394 926 930 1109 1462 1: 284993459 0.2 0.1 0.2 0.0 0.0 0.0 0.0 0.2 0.2 2: 511310430 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 3: 1034630958 0.0 0.0 0.0 0.3 0.0 0.3 0.2 0.0 0.3 4: 1235581326 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5
Note that the variants presented so far have skipped the empty dataframe which belongs to bundle
1048768805 (likewise the answers by Moody_Mudskipper and chinsoon12).
In order to keep the empty dataframe in the final result, the order of the join has to be changed so that all rows of bundle
will be kept:
library(data.table)
dcast(
rbindlist(d2, id = "bid")[setDT(bundle)[, bid := .I], on = "bid"],
bundle ~ id, fill = 0
)[, "NA" := NULL][]
bundle 35 211 220 381 394 926 930 1109 1462 1: 284993459 0.2 0.1 0.2 0.0 0.0 0.0 0.0 0.2 0.2 2: 511310430 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 3: 1034630958 0.0 0.0 0.0 0.3 0.0 0.3 0.2 0.0 0.3 4: 1048768805 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5: 1235581326 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5
Or, if the exact order of bundle
is to be maintained:
library(data.table)
dcast(
rbindlist(d2, id = "bid")[setDT(bundle)[, bid := .I], on = "bid"],
bid + bundle ~ id, fill = 0
)[, c("bid", "NA") := NULL][]
bundle 35 211 220 381 394 926 930 1109 1462 1: 284993459 0.2 0.1 0.2 0.0 0.0 0.0 0.0 0.2 0.2 2: 1048768805 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3: 511310430 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 4: 1034630958 0.0 0.0 0.0 0.3 0.0 0.3 0.2 0.0 0.3 5: 1235581326 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5
Upvotes: 0
Reputation: 11955
Another approach could be
library(data.table)
library(tidyverse)
df <- rbindlist(
lapply(lapply(d2, function(x) if(nrow(x)==0) data.frame(id=NA, value=NA) else x), #in case there is no dataframe row in a list assign a blank dataframe
function(y) y %>% spread(id, value)), #convert all dataframes in wide format
fill = T) %>% #rbind all dataframe in a single dataframe
select(-`<NA>`) %>%
cbind.data.frame(bundle = bundle$bundle)
Output is:
35 211 220 1109 1462 394 381 926 930 bundle
1: 0.2 0.1 0.2 0.2 0.2 NA NA NA NA 284993459
2: NA NA NA NA NA NA NA NA NA 1048768805
3: NA NA NA NA 0.5 0.5 NA NA NA 511310430
4: NA NA NA NA 0.3 NA 0.3 0.3 0.2 1034630958
5: NA NA NA NA 0.5 NA NA 0.5 NA 1235581326
Sample data:
bundle <- data.frame(bundle = c(284993459,1048768805,511310430,1034630958,1235581326))
d2 <- list(data.frame(id = c(35,1462,1109,220,211), value = c(0.2, 0.2, 0.2,0.2,0.1)),
data.frame(id = NULL, value = NULL),
data.frame(id = c(394,1462), value = c(0.5,0.5)),
data.frame(id = c(926,1462,381,930), value = c(0.3,0.3,0.3,0.2)),
data.frame(id = c(926,1462), value = c(0.5,0.5)))
Upvotes: 0
Reputation: 47300
Here's a tidyverse
solution. First we add a bundle column to all data.frames
and stitch them together using purr::map2_dfr
, then use tidyr::spread
to format as wide.
library(tidyverse)
res <- map2_dfr(bundle$bundle,d2,~mutate(.y,bundle=.x)) %>%
spread(id,value,)
res[is.na(res)] <- 0
# bundle 35 211 220 381 394 926 930 1109 1462
# 1 284993459 0.2 0.1 0.2 0.0 0.0 0.0 0.0 0.2 0.2
# 2 511310430 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5
# 3 1034630958 0.0 0.0 0.0 0.3 0.0 0.3 0.2 0.0 0.3
# 4 1235581326 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5
Upvotes: 2
Reputation: 25225
You can first add the bundle to each data.frame within the list, then pivot it using reshape2::dcast
or data.table::dcast
before updating NAs to 0
ans <- data.table::dcast(
do.call(rbind, Map(function(nm, DF) within(DF, bundle <- nm), bundle$bundle, d2)),
bundle ~ id)
ans[is.na(ans)] <- 0
ans
# bundle 35 211 220 381 394 926 930 1109 1462
#1 284993459 0.2 0.1 0.2 0.0 0.0 0.0 0.0 0.2 0.2
#2 511310430 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5
#3 1034630958 0.0 0.0 0.0 0.3 0.0 0.3 0.2 0.0 0.3
#4 1235581326 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5
edit: adding more explanations after OP's comment
1) function(nm, DF) within(DF, bundle <- nm)
takes the input data.frame DF and adds a new column called bundle
with values equal to nm
.
2) Map
applies a function to the corresponding elements of given vectors. (see ?Map
) That means that Map
applies the above function using each of the bundle
values and add them to each data.frame in d2
Upvotes: 0