sakwa
sakwa

Reputation: 65

Find element in vector a that's in the same list as element in vector b

I've got a list of 3 lists categorizing things into fruits, vehicles and flowers.

category <-
  structure(
    list(
      fruits = c("apple", "banana", "pear", "lemon", "kiwi", "orange"),
      vehicles = c("car", "bike", "motorbike", "train", "plane"),
      flowers <- list("rose", "tulip", "sunflower")
    ),
    .Names = c(
      "fruits", "vehicles", "flowers"
    )
  )

Then I've got a dataframe with 2 vectors containing the elements from the lists. Vector a can have any number of objects per cell, vector b just has one element per cell.

a <- I(list(c("apple", "car"), 
        c("motorbike", "banana", "tulip"), 
        c("rose", "kiwi", "apple"), 
        c("bike", "sunflower", "lemon"), 
        c("orange"), 
        c("tulip", "pear")))
b <- c("motorbike", "pear", "sunflower", "orange", "car", "apple")
funnydata <- data.frame(a, b)

I want to create a third vector which gives the element(s) in vector a that's in the same list/category as the element in vector b. So the desired result would be

             a         b      c
1   apple, car motorbike    car
2 motorbik....      pear banana
3 rose, ki.... sunflower   rose
4 bike, su....    orange  lemon
5       orange       car     NA
6  tulip, pear     apple   pear

I manage to get the element in vector a that's in a specific list as long as I leave the list fixed:

funnydata$c <- sapply(funnydata$a, function(x) intersect(fruits, unlist(x))) # fixed list

funnydata$c
[[1]]
[1] "apple"

[[2]]
[1] "banana"

[[3]]
[1] "apple" "kiwi" 

[[4]]
[1] "lemon"

[[5]]
[1] "orange"

[[6]]
[1] "pear"

I can also specify the list b is in:

sapply(funnydata$b, function(y) names(category[grep(y, category) ]))

[1] "vehicles" "fruits"   "flowers"  "fruits"   "vehicles" "fruits"

But I'm stuck at combining the two. I get all character(0) if I try

funnydata$c <- sapply(funnydata$a, function(x) intersect(sapply(funnydata$b, function(y) 
  category[grep(y, category) ]), unlist(x)))

Can somebody help?

Edit

I noticed a mistake in the original posting: The objects in categoryare all supposed to be of the same type (vector or list, whichever fits the needs better). so it should be:

category <-
  structure(
    list(
      fruits = c("apple", "banana", "pear", "lemon", "kiwi", "orange"),
      vehicles = c("car", "bike", "motorbike", "train", "plane"),
      flowers = c("rose", "tulip", "sunflower")
    ),
    .Names = c(
      "fruits", "vehicles", "flowers"
    )
  )

Don't know if that changes anything for the existing answers. I'm still trying to wrap my mind around them. I'm sorry if this copy-and-paste error made things more complicated than they had to be.

Upvotes: 2

Views: 76

Answers (2)

Nathan Werth
Nathan Werth

Reputation: 5263

Most problems concerning data.frames with list columns can be solved by converting those list columns into "flat" vectors.

So we'll convert the two original data.frames into longer versions:

category_df <- data.frame(
  group  = rep(names(category), times = lengths(category)),
  member = unlist(category)
)

category_df
#              group    member
# fruits1     fruits     apple
# fruits2     fruits    banana
# fruits3     fruits      pear
# fruits4     fruits     lemon
# fruits5     fruits      kiwi
# fruits6     fruits    orange
# vehicles1 vehicles       car
# vehicles2 vehicles      bike
# vehicles3 vehicles motorbike
# vehicles4 vehicles     train
# vehicles5 vehicles     plane
# flowers1   flowers      rose
# flowers2   flowers     tulip
# flowers3   flowers sunflower

funnydata[["index"]] <- seq_len(nrow(funnydata))
funny_flat <- data.frame(
  a     = unlist(funnydata[["a"]]),
  b     = rep(funnydata[["b"]], times = lengths(funnydata[["a"]])),
  index = rep(funnydata[["index"]], times = lengths(funnydata[["a"]]))
)

funny_flat
#            a         b index
# 1      apple motorbike     1
# 2        car motorbike     1
# 3  motorbike      pear     2
# 4     banana      pear     2
# 5      tulip      pear     2
# 6       rose sunflower     3
# 7       kiwi sunflower     3
# 8      apple sunflower     3
# 9       bike    orange     4
# 10 sunflower    orange     4
# 11     lemon    orange     4
# 12    orange       car     5
# 13     tulip     apple     6
# 14      pear     apple     6

I also added an index, so we know which values came from which original rows. Now it's just doing a couple simple merges, with some renaming.

funny_flat <- merge(funny_flat, category_df, by.x = "a", by.y = "member")
names(funny_flat)[names(funny_flat) == "group"] <- "group_a"

funny_flat <- merge(funny_flat, category_df, by.x = "b", by.y = "member")
names(funny_flat)[names(funny_flat) == "group"] <- "group_b"

funny_flat
#            b         a index  group_a  group_b
# 1      apple      pear     6   fruits   fruits
# 2      apple     tulip     6  flowers   fruits
# 3        car    orange     5   fruits vehicles
# 4  motorbike     apple     1   fruits vehicles
# 5  motorbike       car     1 vehicles vehicles
# 6     orange      bike     4 vehicles   fruits
# 7     orange     lemon     4   fruits   fruits
# 8     orange sunflower     4  flowers   fruits
# 9       pear motorbike     2 vehicles   fruits
# 10      pear    banana     2   fruits   fruits
# 11      pear     tulip     2  flowers   fruits
# 12 sunflower     apple     3   fruits  flowers
# 13 sunflower      rose     3  flowers  flowers
# 14 sunflower      kiwi     3   fruits  flowers

Now, we'll code your original goal: finding values for which a and b share a category. c will be the value from a, so that's also just a renaming.

funny_matching <- funny_flat[funny_flat[["group_a"]] == funny_flat[["group_b"]], ]
names(funny_matching)[names(funny_flat) == "a"] <- "c"
funny_matching
#            b      c index  group_a  group_b
# 1      apple   pear     6   fruits   fruits
# 5  motorbike    car     1 vehicles vehicles
# 7     orange  lemon     4   fruits   fruits
# 10      pear banana     2   fruits   fruits
# 13 sunflower   rose     3  flowers  flowers

Again, a merge, using the index from before.

merge(
  funnydata,
  funny_matching[, c("c", "index")],
  by = "index",
  all.x = TRUE
)
#   index            a         b      c
# 1     1   apple, car motorbike    car
# 2     2 motorbik....      pear banana
# 3     3 rose, ki.... sunflower   rose
# 4     4 bike, su....    orange  lemon
# 5     5       orange       car   <NA>
# 6     6  tulip, pear     apple   pear

Upvotes: 2

akrun
akrun

Reputation: 886978

We can do this with join

library(tidyverse)
dat <-  rownames_to_column(funnydata, 'rn')
catdat <- stack(category)  
dat %>% 
   unnest %>% 
   left_join(catdat, by = c(a = "values")) %>%
   left_join(catdat, by = c(b = "values")) %>%
   filter(ind.x == ind.y) %>% 
   select(rn, c=a) %>% 
   right_join(dat) %>%
   select(names(funnydata), c)
#            a         b      c
#1   apple, car motorbike    car
#2 motorbik....      pear banana
#3 rose, ki.... sunflower   rose
#4 bike, su....    orange  lemon
#5       orange       car   <NA>
#6  tulip, pear     apple   pear

Upvotes: 2

Related Questions