Reputation: 77
Suppose I have two dataframes like
> r1 <- data.frame(replicate(5, sample(1:3, 5, replace = TRUE)))
> r2 <- data.frame(replicate(5, sample(1:3, 5, replace = TRUE)))
> r1
X1 X2 X3 X4 X5
1 2 3 1 3 1
2 1 3 1 1 3
3 3 2 3 3 3
4 1 1 1 2 3
5 1 1 3 2 3
> r2
X1 X2 X3 X4 X5
1 1 3 3 3 2
2 3 1 2 1 2
3 1 1 1 2 2
4 2 3 1 2 2
5 2 2 1 2 3
I would like to subset r1 so that the result would only contain rows where r1$X1 == r2$X1 AND r1$X2 == r2$X2 AND r1$X3 == r2$X3, e.g.
r1_example
X1 X2 X3 X4 X5
1 2 3 1 3 1
4 1 1 1 2 3
If I actually do subset, I'm getting more values, because some of the columns separately correspond to the r2 columns.
> r1_sub <- subset(r1, X1 %in% r2$X1 & X2 %in% r2$X2 & X3 %in% r2$X3)
> r1_sub
X1 X2 X3 X4 X5
1 2 3 1 3 1
2 1 3 1 1 3
3 3 2 3 3 3
4 1 1 1 2 3
5 1 1 3 2 3
I can figure a workaround like
> r1$concat <- paste(r1$X1, '&', r1$X2, '&', r1$X3)
> r2$concat <- paste(r2$X1, '&', r2$X2, '&', r2$X3)
> r1_concat <- subset(r1, concat %in% r2$concat)
> r1_concat
X1 X2 X3 X4 X5 concat
1 2 3 1 3 1 2 & 3 & 1
4 1 1 1 2 3 1 & 1 & 1
But that's crude to say the least. Is there a more elegant solution?
Upvotes: 0
Views: 39
Reputation: 8880
using semi_join
library(tidyverse)
r1 <- read.table(text = " X1 X2 X3 X4 X5
1 2 3 1 3 1
2 1 3 1 1 3
3 3 2 3 3 3
4 1 1 1 2 3
5 1 1 3 2 3", header = T)
r2 <- read.table(text = " X1 X2 X3 X4 X5
1 1 3 3 3 2
2 3 1 2 1 2
3 1 1 1 2 2
4 2 3 1 2 2
5 2 2 1 2 3", header = T)
semi_join(r1, r2, by = c("X1", "X2", "X3"))
#> X1 X2 X3 X4 X5
#> 1 2 3 1 3 1
#> 4 1 1 1 2 3
Created on 2021-03-12 by the reprex package (v1.0.0)
Upvotes: 1
Reputation: 388797
You can keep only the columns X1
to X3
in r2
and merge
the data :
merge(r1, r2[c('X1', 'X2', 'X3')])
X1 X2 X3 X4 X5
#1 1 1 1 2 3
#2 2 3 1 3 1
In dplyr
:
library(dplyr)
r2 %>% select(X1:X3) %>% inner_join(r1)
Upvotes: 2