Reputation: 1854
I have two datasets that I want to merge. One of the columns that I want to use as a key to merge has the values in a list. If any of those values appear in the second dataset’s column, I want the value in the other column to be merged into the first dataset – which might mean there are multiple values, which should be presented as a list.
That is quite hard to explain but hopefully this example data makes it clearer.
Example data
library(data.table)
mother_dt <- data.table(mother = c("Penny", "Penny", "Anya", "Sam", "Sam", "Sam"),
child = c("Violet", "Prudence", "Erika", "Jake", "Wolf", "Red"))
mother_dt [, children := .(list(unique(child))), by = mother]
mother_dt [, child := NULL]
mother_dt <- unique(mother_dt , by = "mother")
child_dt <- data.table(child = c("Violet", "Prudence", "Erika", "Jake", "Wolf", "Red"),
age = c(10, 8, 9, 6, 5, 2))
So for example, the first row in my new dataset would have “Penny” in themother
column, a list containing “Violet” and “Prudence” in the children
column, and a list containing 10 and 8 in the age
column.
I've tried the following:
combined_dt <- mother_dt[, child_age := ifelse(child_dt$child %in% children,
.(list(unique(child_dt$age))), NA)
But that just contains a list of all the ages in the final row.
I appreciate this is probably quite unusual behaviour but is there a way to achieve it?
Edit: The final datatable would look like this:
final_dt <- data.table(mother = c("Penny", "Anya", "Sam"),
children = c(list(c("Violet", "Prudence")), list(c("Erika")), list(c("Jake", "Wolf", "Red"))),
age = c(list(c(10, 8)), list(c(9)), list(c(6, 5, 2))))
Upvotes: 1
Views: 191
Reputation: 47350
You could do it the following way, which has the advantage of preserving duplicates in mother
column when they exist.
mother_dt$age <- lapply(
mother_dt$children,
function(x,y) y[x],
y = setNames(child_dt$age, child_dt$child))
mother_dt
# mother children age
# 1: Penny Violet,Prudence 10, 8
# 2: Anya Erika 9
# 3: Sam Jake,Wolf,Red 6,5,2
I translates nicely into tidyverse
syntax :
library(tidyverse)
mutate(mother_dt, age = map(children,~.y[.], deframe(child_dt)))
# mother children age
# 1 Penny Violet, Prudence 10, 8
# 2 Anya Erika 9
# 3 Sam Jake, Wolf, Red 6, 5, 2
Upvotes: 0
Reputation: 2764
You can do something like this-
library(splitstackshape)
newm <- mother_dt[,.(children=unlist(children)),by=mother]
final_dt <- merge(newm,child_dt,by.x = "children",by.y = "child")
> aggregate(. ~ mother, data = cv, toString)
mother children age
1 Anya Erika 9
2 Penny Prudence, Violet 8, 10
3 Sam Jake, Red, Wolf 6, 2, 5
Upvotes: 1
Reputation: 2017
The easiest way I can think of is, first unlist the children, then merge, then list again:
mother1 <- mother_dt[,.(children=unlist(children)),by=mother]
mother1[child_dt,on=c(children='child')][,.(children=list(children),age=list(age)),by=mother]
Upvotes: 1