Reputation: 77

Subset a dataframe based on multiple columns simultaneously

Suppose I have two dataframes like

> r1 <- data.frame(replicate(5, sample(1:3, 5, replace = TRUE)))
> r2 <- data.frame(replicate(5, sample(1:3, 5, replace = TRUE)))

> r1
  X1 X2 X3 X4 X5
1  2  3  1  3  1
2  1  3  1  1  3
3  3  2  3  3  3
4  1  1  1  2  3
5  1  1  3  2  3

> r2
  X1 X2 X3 X4 X5
1  1  3  3  3  2
2  3  1  2  1  2
3  1  1  1  2  2
4  2  3  1  2  2
5  2  2  1  2  3

I would like to subset r1 so that the result would only contain rows where r1$X1 == r2$X1 AND r1$X2 == r2$X2 AND r1$X3 == r2$X3, e.g.

r1_example
  X1 X2 X3 X4 X5
1  2  3  1  3  1
4  1  1  1  2  3

If I actually do subset, I'm getting more values, because some of the columns separately correspond to the r2 columns.

> r1_sub <- subset(r1, X1 %in% r2$X1 & X2 %in% r2$X2 & X3 %in% r2$X3)
> r1_sub
  X1 X2 X3 X4 X5
1  2  3  1  3  1
2  1  3  1  1  3
3  3  2  3  3  3
4  1  1  1  2  3
5  1  1  3  2  3

I can figure a workaround like

> r1$concat <- paste(r1$X1, '&', r1$X2, '&', r1$X3)
> r2$concat <- paste(r2$X1, '&', r2$X2, '&', r2$X3)
> r1_concat <- subset(r1, concat %in% r2$concat)
> r1_concat
  X1 X2 X3 X4 X5    concat
1  2  3  1  3  1 2 & 3 & 1
4  1  1  1  2  3 1 & 1 & 1

But that's crude to say the least. Is there a more elegant solution?

Upvotes: 0

Answers (2)

Yuriy Saraykin

Reputation: 8880

using semi_join

library(tidyverse)

r1 <- read.table(text = "  X1 X2 X3 X4 X5
1  2  3  1  3  1
2  1  3  1  1  3
3  3  2  3  3  3
4  1  1  1  2  3
5  1  1  3  2  3", header = T)

r2 <- read.table(text = "  X1 X2 X3 X4 X5
1  1  3  3  3  2
2  3  1  2  1  2
3  1  1  1  2  2
4  2  3  1  2  2
5  2  2  1  2  3", header = T)

semi_join(r1, r2, by = c("X1", "X2", "X3"))
#>   X1 X2 X3 X4 X5
#> 1  2  3  1  3  1
#> 4  1  1  1  2  3

^{Created on 2021-03-12 by the reprex package (v1.0.0)}

Upvotes: 1

Ronak Shah

Reputation: 389235

You can keep only the columns X1 to X3 in r2 and merge the data :

merge(r1, r2[c('X1', 'X2', 'X3')])

   X1 X2 X3 X4 X5
#1  1  1  1  2  3
#2  2  3  1  3  1

In dplyr :

library(dplyr)
r2 %>% select(X1:X3) %>% inner_join(r1)

Upvotes: 2

Subset a dataframe based on multiple columns simultaneously

Answers (2)

Related Questions