Find All Unique rows based on single column and exclude all duplicate rows

Question

I have two requirements

find all duplicate values in single column
find all unique rows [opposite to first question] This should not include even single pair from duplicated rows

I'm Learning since last 2 weeks. Watching YouTube videos, Referring Stackoverflow and other websites, so not much. Please do refer if any material or courses.

so answer to my first question i found here (Find duplicated elements with dplyr)

# All duplicated elements
mtcars %>%
  filter(carb %in% unique(.[["carb"]][duplicated(.[["carb"]])]))

So i want opposite of this

Thanks

P.S. I have non technical background. I went through couple of questions and answers here, so i might have found the answer or needed some of tweaks and i totally ignored that

Konrad Rudolph · Accepted Answer

As you probably realised, unique and duplicated don’t quite what you need, because they essentially cause the retention of all distinct values, and just collapse “multiple copies” of such values.

For your first question, you can group_by the column that you’re interested in, and then retain just those groups (via filter) which have more than one row:

mtcars %>%
    group_by(mpg) %>%
    filter(length(mpg) > 1) %>%
    ungroup()

This example selects all rows for which the mpg value is duplicated. This works because, when applied to groups, dplyr operations such as filter work on each group individually. This means that length(mpg) in the above code will return the length of the mpg column vector of each group, separately.

To invert the logic, it’s enough to invert the filtering condition:

mtcars %>%
    group_by(mpg) %>%
    filter(length(mpg) == 1) %>%
    ungroup()

Find All Unique rows based on single column and exclude all duplicate rows

Answers (1)

Related Questions