Reputation: 41
I have a data frame with three columns and thousands of rows. The first two columns (x and y) contain character strings, and the third (z) contains numeric data. I need to subset the data frame based on matching values in both of the first two columns.
x <- c("a", "b", "c", "d", "f", "g", "h", "i", "j", "k")
y <- c("h", "b", "k", "a", "g", "d", "i", "c", "f", "j")
z <- c(1:10)
df <- data.frame(x, y, z)
x y z
1 a h 1
2 b b 2
3 c k 3
4 d a 4
5 f g 5
6 g d 6
7 h i 7
8 i c 8
9 j f 9
10 k j 10
Say this is my table, and the values I am interested in are "a", "c", "f", "h" and "k". I only want to return the rows in which both x and y contain one of the five, so in this case rows 1 and 3.
I've tried:
df2 <- filter(df,
x == ("a" | "c" | "f" | "h" | "k") &
y == ("a" | "c" | "f" | "h" | "k"))
but this doesn't work for factors or character strings. Is there an equivalent or another way around this?
Thanks in advance.
Upvotes: 3
Views: 7973
Reputation: 2082
What about:
df2 <- filter(df, grepl("[acfhk]",x) & grepl("[acfhk]",y))
using dplyr
package
Upvotes: 0
Reputation: 5193
With dplyr
df2 <- filter(df,
x %in% c("a" ,"c","f" ,"h","k") &
y %in% c("a" ,"c","f" ,"h","k"))
df2
x y z
1 a h 1
2 c k 3
Upvotes: 0
Reputation: 38500
I think this returns what you are looking for:
# build vector of necessary elements
mustHaves <- c("a", "c", "f", "h", "k")
# perform subsetting
df[with(df, x %in% mustHaves & y %in% mustHaves),]
x y z
1 a h 1
3 c k 3
data
df <- data.frame(x, y, z, stringsAsFactors = FALSE)
Upvotes: 4