wake_wake
wake_wake

Reputation: 1204

efficiently look-up data from one data.frame in another data.frame by group

I am looking for a faster solution for the following problem.

Suppose I have the following two data-sets.

df1 <- data.frame(Var1 = c(5011, 2484, 4031, 1143, 7412),
              Var2 = c(2161, 2161, 2161, 2161, 8595))
df2 <- data.frame(team=c("A","A", "B", "B", "B", "C", "C", "D", "D"),
              class=c("5011", "2161", "2484", "4031", "1143", "2161", "5011", "8595", "1143"),
              attribute=c("X1", "X2", "X1", "Z1", "Z2", "Y1", "X1", "Z1", "X2"),
              stringsAsFactors=FALSE)


> df1
  Var1 Var2
1 5011 2161
2 2484 2161
3 4031 2161
4 1143 2161
5 7412 8595

> df2
  team class attribute
1    A  5011        X1
2    A  2161        X2
3    B  2484        X1
4    B  4031        Z1
5    B  1143        Z2
6    C  2161        Y1
7    C  5011        X1
8    D  8595        Z1
9    D  1143        X2

I would like to know which teams in df2 meet in class that correspond to rows in df1. I am not interested in order with-in rows.

My current code (pasted below) works, but is hopelessly inefficient.

Some rules:

Code:

    teams <- c()
    atts <- c()
    pxs <- unique(df2$team)

    for(j in pxs){
     subs <- subset(df2, team==j)
     for(i in 1:nrow(df1)){
      if(all(df1[i,] %in% subs$class)){
    teams <- rbind(teams, subs$team[i])
    atts <- rbind(atts, subs$attribute)
     } 
     }
    }

    output <- cbind(teams, atts)  

> output
     [,1] [,2] [,3]
[1,] "A"  "X1" "X2"
[2,] "C"  "Y1" "X1"

The original data consists of millions of rows in both df1 and df2.

How to do this more efficiently? Perhaps through an apply approach combined with data.table?

Upvotes: 0

Views: 59

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

Not quite sure what your rules are trying to achieve.

Based on your sample data, code and output, you might want to join by each column of df1 first then inner join the 2 results:

library(data.table)
setDT(df1)
setDT(df2)[, cls := as.integer(cls)]

#left join df1 with df2 using Var1
v1 <- df2[df1, on=.(cls=Var1)]

#left join df1 with df2 using Var2
v2 <- df2[df1, on=.(cls=Var2)]

#inner join the 2 previous results to ensure that the same team is picked 
#where classes already match in v1 and v2
v1[v2, on=.(team, cls=Var1, Var2=cls), nomatch=0L]

output:

   team  cls attribute Var2 i.attribute
1:    A 5011        X1 2161          X2
2:    C 5011        X1 2161          Y1

Upvotes: 1

Related Questions