Reputation: 109
Tibble:
df <- tibble(x = c(1, 3, 2, 5, 3, 7, 1, 9),
y = c(rep("condition_one", 5), rep("condition_two", 3)))
Goal:
I want to return rows of 'non-unique' values in x
AND for condition_one
in y
. In this example it would be row 2 and 5 because the value 3 is not unique in x
and y
is condition_one
.
What I first tried was using the duplicated()
function from base R:
df %>%
filter(duplicated(.$x) & y == "condition_one")
Output:
# A tibble: 1 x 2
x y
<dbl> <chr>
1 3 condition_one
Though it only returns the duplicated row and not the "original" one. What would be a solution in this case that I can also include in a dplyr pipe chain?
EDIT:
I see that some answers still don't solve my problem and I think it's because I phrased my question badly:
In my filtered output, I only want rows where the value in x
was not unique. But I want R to return ALL the non-unique rows, not only the one where the duplicate value first occured (which is happening with duplicate()
).
On top of that, I want y == "condition_one"
Upvotes: 1
Views: 1539
Reputation: 1
With base R. It is not pretty, but it works and includes row 1, which I believe it should:
df <- tibble(row = 1:8,
x = c(1, 3, 2, 5, 3, 7, 1, 9),
y = c(rep("condition_one", 5), rep("condition_two", 3)))
selection <- df[df$x %in% df$x[duplicated(df$x)] & df$y == "condition_one",]
or with larger data perhaps:
selection2 <- df[df$x %in% unique(df$x[duplicated(df$x)]) & df$y == "condition_one",]
Upvotes: 0
Reputation: 171
I recently found out this package called campfin
and it contains a function, flag_dupes()
which does exactly what you want. In a nutshell, it adds a logical column it calls dupe_flag
which you can use to filter out unique rows.
Upvotes: 0
Reputation: 1751
I'm not sure I get if you want to find duplicates in the universe where df$y meets "condition_one" or the opposite. I'll write both cases.
In base R:
Case 1: Find all duplicates and then find which ones meet condition.
all_duplicates <- df[duplicated(df$x) | duplicated(df$x, fromLast = TRUE), ]
your_condition <- all_duplicates[all_duplicates$y == "condition_one", ]
Case 2: Find which ones meet condition and then find duplicates there.
df_cond <- df[df$y == "condition_one", ]
all_duplicates <- df_cond[duplicated(df_cond$x) | duplicated(df_cond$x, fromLast = TRUE), ]
Upvotes: 2
Reputation: 68
Here is the solution:
df %>%
group_by(x) %>%
filter(y == "condition_one" & n() == 1) %>%
ungroup()
#Results:
# A tibble: 2 x 2
x y
<dbl> <chr>
1 2 condition_one
2 5 condition_one
Upvotes: 2
Reputation: 887118
Using dplyr
library(dplyr)
df %>%
filter(duplicated(cur_data()) & y == 'condition_one')
Upvotes: 0