Reputation: 67

How do I count the number of duplicates or occurrences of values in each row?

I have a data frame df with a bunch of location data that looks like this:

ID   ZIP1    ZIP2    ZIP3    ZIP4
1    98109   01720   98109   94118
2    94118   01720   01718   94109
3    01720   95872   95872   01720
4    01718   94109   94118   01720
5    94109   94109   95872   94109
6    94118   01720   94109   95872

I'd like to return the number of individuals have the same ZIP code in multiple columns (i.e. #1 has 98109 showing up twice, but #2 has no duplicate codes, etc.), so the correct value should be 3 individuals.

I've tried anyDuplicated(df$ZIP1, df$ZIP2, df$ZIP3, df$ZIP4) but it is not giving me the correct value.

How can I obtain the correct value, whether that is in multiple steps or in one immediate calculation?

Upvotes: 2

Answers (2)

h3ab74

Reputation: 328

Just try using the table function that comes with R and go table(df). If I understand correctly this should do the trick.

If you want to see the abundance of these ZIP columns independently, you could just go table(df$ZIP1) and it will show you the number of repeats in a tabular format.

Upvotes: 0

Ronak Shah

Reputation: 389275

We could use apply row-wise and find out if there is any duplicate and count the sum.

sum(apply(df, 1, function(x) any(duplicated(x))))
#[1] 3

Or with anyDuplicated

sum(apply(df, 1, anyDuplicated) > 0)
#[1] 3

Another option using tidyverse

library(tidyverse)


df %>%
  gather(key, value, -ID) %>%
  group_by(ID) %>%
  summarise(dupe = any(duplicated(value))) %>%
  ungroup() %>%
  summarise(final = sum(dupe)) %>%
  pull(final)

#[1] 3

df %>%
  gather(key, value, -ID) %>%
  group_by(ID) %>%
  summarise(dupe = anyDuplicated(value)) %>%
  ungroup() %>%
  summarise(final = sum(dupe > 0)) %>%
  pull(final)

#[1] 3

Upvotes: 3

How do I count the number of duplicates or occurrences of values in each row?

Answers (2)

Related Questions