Village.Idyot
Village.Idyot

Reputation: 2043

How to select unique rows from data frame subject to conditions using dplyr or base R?

I have the following data frame called test:

> test
  concat grpRnk
1    1.1      1
2    1.2      1
3    2.1      3
4    2.1      2
5    2.2      3
6    2.2      2
7    3.1      4
8    3.2      4

And I run this bit of dplyr code test %>% distinct(concat, .keep_all = TRUE) to get the following output, showing the unique rows in the concat column:

> test %>% distinct(concat, .keep_all = TRUE)
  concat grpRnk
1    1.1      1
2    1.2      1
3    2.1      3
4    2.2      3
5    3.1      4
6    3.2      4

How do I modify this bit of code to instead remove rows numbers 3 and 5 in the original test data frame where grpRnk was 3 for both? The current bit of code removed those dupes where grpRnk = 2. In base R is fine too!

Here's the code for generating test data frame:

test <- data.frame(concat = c(1.1,1.2,2.1,2.1,2.2,2.2,3.1,3.2),
                   grpRnk = c(1,1,3,2,3,2,4,4))

Upvotes: 0

Views: 94

Answers (1)

an_ja
an_ja

Reputation: 427

Obviously, the first case is kept in each case. Therefore you should sort the corresponding variable before.

test %>% 
  arrange(grpRnk) %>% 
  distinct(concat, .keep_all = TRUE) 

If, as you write, it depends on other columns' values, it might be safer to take an intermediate step and create a new variable that shows all multiple cases. This way you have more control and you can delete the cases in a seperate step.

test %>% 
  mutate(dup = duplicated(concat, fromLast = TRUE) | duplicated(concat))

Upvotes: 1

Related Questions