Reputation: 65
I've got a list of 3 lists categorizing things into fruits, vehicles and flowers.
category <-
structure(
list(
fruits = c("apple", "banana", "pear", "lemon", "kiwi", "orange"),
vehicles = c("car", "bike", "motorbike", "train", "plane"),
flowers <- list("rose", "tulip", "sunflower")
),
.Names = c(
"fruits", "vehicles", "flowers"
)
)
Then I've got a dataframe with 2 vectors containing the elements from the lists. Vector a can have any number of objects per cell, vector b just has one element per cell.
a <- I(list(c("apple", "car"),
c("motorbike", "banana", "tulip"),
c("rose", "kiwi", "apple"),
c("bike", "sunflower", "lemon"),
c("orange"),
c("tulip", "pear")))
b <- c("motorbike", "pear", "sunflower", "orange", "car", "apple")
funnydata <- data.frame(a, b)
I want to create a third vector which gives the element(s) in vector a that's in the same list/category as the element in vector b. So the desired result would be
a b c
1 apple, car motorbike car
2 motorbik.... pear banana
3 rose, ki.... sunflower rose
4 bike, su.... orange lemon
5 orange car NA
6 tulip, pear apple pear
I manage to get the element in vector a that's in a specific list as long as I leave the list fixed:
funnydata$c <- sapply(funnydata$a, function(x) intersect(fruits, unlist(x))) # fixed list
funnydata$c
[[1]]
[1] "apple"
[[2]]
[1] "banana"
[[3]]
[1] "apple" "kiwi"
[[4]]
[1] "lemon"
[[5]]
[1] "orange"
[[6]]
[1] "pear"
I can also specify the list b is in:
sapply(funnydata$b, function(y) names(category[grep(y, category) ]))
[1] "vehicles" "fruits" "flowers" "fruits" "vehicles" "fruits"
But I'm stuck at combining the two. I get all character(0)
if I try
funnydata$c <- sapply(funnydata$a, function(x) intersect(sapply(funnydata$b, function(y)
category[grep(y, category) ]), unlist(x)))
Can somebody help?
Edit
I noticed a mistake in the original posting: The objects in category
are all supposed to be of the same type (vector or list, whichever fits the needs better). so it should be:
category <-
structure(
list(
fruits = c("apple", "banana", "pear", "lemon", "kiwi", "orange"),
vehicles = c("car", "bike", "motorbike", "train", "plane"),
flowers = c("rose", "tulip", "sunflower")
),
.Names = c(
"fruits", "vehicles", "flowers"
)
)
Don't know if that changes anything for the existing answers. I'm still trying to wrap my mind around them. I'm sorry if this copy-and-paste error made things more complicated than they had to be.
Upvotes: 2
Views: 76
Reputation: 5263
Most problems concerning data.frames with list columns can be solved by converting those list columns into "flat" vectors.
So we'll convert the two original data.frames into longer versions:
category_df <- data.frame(
group = rep(names(category), times = lengths(category)),
member = unlist(category)
)
category_df
# group member
# fruits1 fruits apple
# fruits2 fruits banana
# fruits3 fruits pear
# fruits4 fruits lemon
# fruits5 fruits kiwi
# fruits6 fruits orange
# vehicles1 vehicles car
# vehicles2 vehicles bike
# vehicles3 vehicles motorbike
# vehicles4 vehicles train
# vehicles5 vehicles plane
# flowers1 flowers rose
# flowers2 flowers tulip
# flowers3 flowers sunflower
funnydata[["index"]] <- seq_len(nrow(funnydata))
funny_flat <- data.frame(
a = unlist(funnydata[["a"]]),
b = rep(funnydata[["b"]], times = lengths(funnydata[["a"]])),
index = rep(funnydata[["index"]], times = lengths(funnydata[["a"]]))
)
funny_flat
# a b index
# 1 apple motorbike 1
# 2 car motorbike 1
# 3 motorbike pear 2
# 4 banana pear 2
# 5 tulip pear 2
# 6 rose sunflower 3
# 7 kiwi sunflower 3
# 8 apple sunflower 3
# 9 bike orange 4
# 10 sunflower orange 4
# 11 lemon orange 4
# 12 orange car 5
# 13 tulip apple 6
# 14 pear apple 6
I also added an index, so we know which values came from which original rows. Now it's just doing a couple simple merges, with some renaming.
funny_flat <- merge(funny_flat, category_df, by.x = "a", by.y = "member")
names(funny_flat)[names(funny_flat) == "group"] <- "group_a"
funny_flat <- merge(funny_flat, category_df, by.x = "b", by.y = "member")
names(funny_flat)[names(funny_flat) == "group"] <- "group_b"
funny_flat
# b a index group_a group_b
# 1 apple pear 6 fruits fruits
# 2 apple tulip 6 flowers fruits
# 3 car orange 5 fruits vehicles
# 4 motorbike apple 1 fruits vehicles
# 5 motorbike car 1 vehicles vehicles
# 6 orange bike 4 vehicles fruits
# 7 orange lemon 4 fruits fruits
# 8 orange sunflower 4 flowers fruits
# 9 pear motorbike 2 vehicles fruits
# 10 pear banana 2 fruits fruits
# 11 pear tulip 2 flowers fruits
# 12 sunflower apple 3 fruits flowers
# 13 sunflower rose 3 flowers flowers
# 14 sunflower kiwi 3 fruits flowers
Now, we'll code your original goal: finding values for which a
and b
share a category. c
will be the value from a
, so that's also just a renaming.
funny_matching <- funny_flat[funny_flat[["group_a"]] == funny_flat[["group_b"]], ]
names(funny_matching)[names(funny_flat) == "a"] <- "c"
funny_matching
# b c index group_a group_b
# 1 apple pear 6 fruits fruits
# 5 motorbike car 1 vehicles vehicles
# 7 orange lemon 4 fruits fruits
# 10 pear banana 2 fruits fruits
# 13 sunflower rose 3 flowers flowers
Again, a merge, using the index from before.
merge(
funnydata,
funny_matching[, c("c", "index")],
by = "index",
all.x = TRUE
)
# index a b c
# 1 1 apple, car motorbike car
# 2 2 motorbik.... pear banana
# 3 3 rose, ki.... sunflower rose
# 4 4 bike, su.... orange lemon
# 5 5 orange car <NA>
# 6 6 tulip, pear apple pear
Upvotes: 2
Reputation: 886978
We can do this with join
library(tidyverse)
dat <- rownames_to_column(funnydata, 'rn')
catdat <- stack(category)
dat %>%
unnest %>%
left_join(catdat, by = c(a = "values")) %>%
left_join(catdat, by = c(b = "values")) %>%
filter(ind.x == ind.y) %>%
select(rn, c=a) %>%
right_join(dat) %>%
select(names(funnydata), c)
# a b c
#1 apple, car motorbike car
#2 motorbik.... pear banana
#3 rose, ki.... sunflower rose
#4 bike, su.... orange lemon
#5 orange car <NA>
#6 tulip, pear apple pear
Upvotes: 2