Reputation: 423
This seems pretty straightforward but I am stumped. I have a data frame that looks like this:
df1 %>% head()
values paired
<ch> <int>
1 apples 1
2 x 1
3 oranges 2
4 z 2
5 bananas 3
6 y 3
7 apples 4
8 p 4
I would like to create a new data frame that extracts all paired values based on a search criteria. So if I want all pairs that correspond to apples I would like to end up with something like this:
df1 %>% head()
values paired
<ch> <int>
1 apples 1
2 x 1
3 apples 4
4 p 4
I have tried using:
new_pairs <- df1 %>%
arrange(values, paired) %>%
filter(duplicated(paired) == TRUE,
values=="apples")
But I am getting only the apple rows back
Upvotes: 0
Views: 609
Reputation: 46886
In base R, find the values of the pairs of interest
pairs = subset(df1, values %in% "apples")$paired
and create a subset of the data
subset(df1, paired %in% pairs)
Upvotes: 0
Reputation: 16876
Here is another tidyverse
possibility. I filter
for the rows that have apples
and also keep the rows that immediately follow apples
.
library(tidyverse)
df %>%
filter((values == "apples" |
row_number() %in% c(which(
values == "apples", arr.ind = TRUE
) + 1)))
Output
values paired
1 apples 1
2 x 1
3 apples 4
4 p 4
Here is a data.table
option (subset
is only used to change the order of the columns):
library(data.table)
dt <- as.data.table(df)
subset(dt[, .SD[any(values == "apples")], by = paired], select = c("values", "paired"))
values paired
1: apples 1
2: x 1
3: apples 4
4: p 4
Data
df <-
structure(list(
values = c("apples", "x", "oranges", "z", "bananas",
"y", "apples", "p"),
paired = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L)
),
class = "data.frame",
row.names = c(NA,-8L))
Upvotes: 1
Reputation: 33812
You'll need to group on the paired variable before filtering.
How about:
df1 %>%
group_by(paired) %>%
filter("apples" %in% values) %>%
ungroup()
Result:
# A tibble: 4 x 2
values paired
<chr> <int>
1 apples 1
2 x 1
3 apples 4
4 p 4
Your data:
df1 <- structure(list(values = c("apples", "x", "oranges", "z", "bananas", "y", "apples", "p"),
paired = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L)),
class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8"))
Upvotes: 3