Reputation: 1204
I am looking for a faster solution for the following problem.
Suppose I have the following two data-sets.
df1 <- data.frame(Var1 = c(5011, 2484, 4031, 1143, 7412),
Var2 = c(2161, 2161, 2161, 2161, 8595))
df2 <- data.frame(team=c("A","A", "B", "B", "B", "C", "C", "D", "D"),
class=c("5011", "2161", "2484", "4031", "1143", "2161", "5011", "8595", "1143"),
attribute=c("X1", "X2", "X1", "Z1", "Z2", "Y1", "X1", "Z1", "X2"),
stringsAsFactors=FALSE)
> df1
Var1 Var2
1 5011 2161
2 2484 2161
3 4031 2161
4 1143 2161
5 7412 8595
> df2
team class attribute
1 A 5011 X1
2 A 2161 X2
3 B 2484 X1
4 B 4031 Z1
5 B 1143 Z2
6 C 2161 Y1
7 C 5011 X1
8 D 8595 Z1
9 D 1143 X2
I would like to know which teams in df2
meet in class
that correspond to rows in df1
. I am not interested in order with-in rows.
My current code (pasted below) works, but is hopelessly inefficient.
Some rules:
df1
.df1
. They are excluded from the output.Code:
teams <- c()
atts <- c()
pxs <- unique(df2$team)
for(j in pxs){
subs <- subset(df2, team==j)
for(i in 1:nrow(df1)){
if(all(df1[i,] %in% subs$class)){
teams <- rbind(teams, subs$team[i])
atts <- rbind(atts, subs$attribute)
}
}
}
output <- cbind(teams, atts)
> output
[,1] [,2] [,3]
[1,] "A" "X1" "X2"
[2,] "C" "Y1" "X1"
The original data consists of millions of rows in both df1
and df2
.
How to do this more efficiently? Perhaps through an apply
approach combined with data.table
?
Upvotes: 0
Views: 59
Reputation: 25225
Not quite sure what your rules are trying to achieve.
Based on your sample data, code and output, you might want to join by each column of df1 first then inner join the 2 results:
library(data.table)
setDT(df1)
setDT(df2)[, cls := as.integer(cls)]
#left join df1 with df2 using Var1
v1 <- df2[df1, on=.(cls=Var1)]
#left join df1 with df2 using Var2
v2 <- df2[df1, on=.(cls=Var2)]
#inner join the 2 previous results to ensure that the same team is picked
#where classes already match in v1 and v2
v1[v2, on=.(team, cls=Var1, Var2=cls), nomatch=0L]
output:
team cls attribute Var2 i.attribute
1: A 5011 X1 2161 X2
2: C 5011 X1 2161 Y1
Upvotes: 1