Chris
Chris

Reputation: 2071

filter multiple columns together with criteria - R

I have found plenty of similar questions (1,2,3 are some of them), but none of them answers the mine:

I have this data:

set.seed(100)
df <- data.frame(X = sample(1:10, 100, replace=TRUE),
                 Y = sample(11:90, 100, replace=TRUE),
                 Z = sample(1000:2000, 100, replace=TRUE),
                 stringsAsFactors = FALSE)
x <- data.frame(X = c(7, 5, 3, 9),
                     Y = c(14, 13, 19, 87),
                     stringsAsFactors = FALSE)

Where x is a subset of df with specific grouping and computations. And now, I'm trying to filter df by both x columns. For example, for a specific row in df, it has to be X=7 and Y=14 to be TRUE, or X=5 and Y=13 to be TRUE, it has to be FALSE if X=7 and Y<>14, and so on. So, the criteria has to consider both columns together. I have tried with this:

> df[df$X == x$X & df$Y == x$Y,]
   X  Y    Z
28 9 87 1071

And this gives me only one true value, when I know it has to be at least 4 (because x is a subset of df)

This is kind-of what I'm looking for (it gives me 0 rows):

df[df[,c("X","Y")] %in% x[,c("X","Y")],]

Expected Output:

   X  Y    Z
16 7 14 1632
28 9 87 1071
30 3 19 1297
38 7 14 1701
67 5 13 1323
77 9 87 1484
88 3 19 1951

Upvotes: 3

Views: 190

Answers (1)

akrun
akrun

Reputation: 887048

May be we need an inner_join

library(dplyr)
inner_join(df, x)
#  X  Y    Z
#1 7 14 1632
#2 9 87 1071
#3 3 19 1297
#4 7 14 1701
#5 5 13 1323
#6 9 87 1484
#7 3 19 1951

If we need the row names to match as well

df[do.call(paste, df[names(x)]) %in% do.call(paste, x),]
#   X  Y    Z
#16 7 14 1632
#28 9 87 1071
#30 3 19 1297
#38 7 14 1701
#67 5 13 1323
#77 9 87 1484
#88 3 19 1951

Upvotes: 2

Related Questions