1S1a4m9
1S1a4m9

Reputation: 27

Find All Unique rows based on single column and exclude all duplicate rows

I have two requirements

  1. find all duplicate values in single column
  2. find all unique rows [opposite to first question] This should not include even single pair from duplicated rows

I'm Learning since last 2 weeks. Watching YouTube videos, Referring Stackoverflow and other websites, so not much. Please do refer if any material or courses.

so answer to my first question i found here (Find duplicated elements with dplyr)

# All duplicated elements
mtcars %>%
  filter(carb %in% unique(.[["carb"]][duplicated(.[["carb"]])]))

So i want opposite of this

Thanks

P.S. I have non technical background. I went through couple of questions and answers here, so i might have found the answer or needed some of tweaks and i totally ignored that

Upvotes: 0

Views: 214

Answers (1)

Konrad Rudolph
Konrad Rudolph

Reputation: 546153

As you probably realised, unique and duplicated don’t quite what you need, because they essentially cause the retention of all distinct values, and just collapse “multiple copies” of such values.

For your first question, you can group_by the column that you’re interested in, and then retain just those groups (via filter) which have more than one row:

mtcars %>%
    group_by(mpg) %>%
    filter(length(mpg) > 1) %>%
    ungroup()

This example selects all rows for which the mpg value is duplicated. This works because, when applied to groups, dplyr operations such as filter work on each group individually. This means that length(mpg) in the above code will return the length of the mpg column vector of each group, separately.

To invert the logic, it’s enough to invert the filtering condition:

mtcars %>%
    group_by(mpg) %>%
    filter(length(mpg) == 1) %>%
    ungroup()

Upvotes: 2

Related Questions