Select from Multiple Columns - R

Question

I have two sets of dataframes. One is combinations of strings where there are two columns, with different types of food:

#df.combination

      [,1]     [,2] 
[1,] "Apple" "Orange"         
[2,] "Apple" "Pear"         
[3,] "Apple" "Avocado" 
[4,] "Orange" "Pear"   
[5,] "Orange" "Avocado"
[6,] "Pear" "Avocado"

The other is a big "main" dataframe that has three columns of food ("id" "date" "food1" "food2" "food3") containing some of these combinations:

#df.main

      [,1]     [,2]     [,3]     [,4]      [,5]    
[1,] "1234"   "3/29"    "Sala"    "Pear"   "Avocado"
[2,] "1235"   "3/30"    "Apple"   "Pear"   "Meat"     
[3,] "1236"   "4/1"     "Orange"   "Juice"  "Apple" 
[4,] "1237"   "4/2"     "Pear"    "Avocado""Turkey"

If I wanted to write a script that searches df.main and selects rows containing all elements from df.combination[1,], (so "Apple" and "Orange"), how would I be able to do that? The foods do not have to be in any order. The row just needs to contain the food. (i.e. df.main[3,]).

Here is an example output I would like to see. If I search for "Orange" and "Apple" (so df.combination[1,]) in df.main, I would like to see the id of row df.main[2,]

#search df.main for row containing df.combination[1,]
#output:
#1236

Thank you! Any help really appreciated.

akrun · Accepted Answer

You could try

 f1 <- function(dat1, dat2, rowindex){
  Indx <- apply(dat1[,grep('food', colnames(dat1))], 1,
         function(x) all(unlist(dat2[rowindex,]) %in% x))
  dat1[Indx,1]
 }
 f1(df.main, df.combination,1)
 #[1] 1236
 f1(df.main, df.combination,2)
 #[1] 1235
 f1(df.main, df.combination,3)
 #integer(0)

data

df.main <- structure(list(id = 1234:1237, date = c("3/29", "3/30",
"4/1", 
"4/2"), food1 = c("Sala", "Apple", "Orange", "Pear"), 
food2 = c("Pear", 
"Pear", "Juice", "Avocado"), food3 = c("Avocado", "Meat", "Apple", 
 "Turkey")), .Names = c("id", "date", "food1", "food2", 
 "food3"), class = "data.frame", row.names = c(NA, -4L))

df.combination <- structure(list(V1 = c("Apple", "Apple", "Apple", 
"Orange", "Orange", 
"Pear"), V2 = c("Orange", "Pear", "Avocado", "Pear", "Avocado", 
"Avocado")), .Names = c("V1", "V2"), class = "data.frame",
row.names = c(NA, -6L))

Select from Multiple Columns - R

Answers (2)

data

Related Questions