Reputation: 67
I have a data frame df
with a bunch of location data that looks like this:
ID ZIP1 ZIP2 ZIP3 ZIP4
1 98109 01720 98109 94118
2 94118 01720 01718 94109
3 01720 95872 95872 01720
4 01718 94109 94118 01720
5 94109 94109 95872 94109
6 94118 01720 94109 95872
I'd like to return the number of individuals have the same ZIP code in multiple columns (i.e. #1 has 98109 showing up twice, but #2 has no duplicate codes, etc.), so the correct value should be 3 individuals.
I've tried anyDuplicated(df$ZIP1, df$ZIP2, df$ZIP3, df$ZIP4)
but it is not giving me the correct value.
How can I obtain the correct value, whether that is in multiple steps or in one immediate calculation?
Upvotes: 2
Views: 57
Reputation: 328
Just try using the table
function that comes with R and go table(df)
. If I understand correctly this should do the trick.
If you want to see the abundance of these ZIP columns independently, you could just go table(df$ZIP1)
and it will show you the number of repeats in a tabular format.
Upvotes: 0
Reputation: 389275
We could use apply
row-wise and find out if there is any
duplicate and count the sum
.
sum(apply(df, 1, function(x) any(duplicated(x))))
#[1] 3
Or with anyDuplicated
sum(apply(df, 1, anyDuplicated) > 0)
#[1] 3
Another option using tidyverse
library(tidyverse)
df %>%
gather(key, value, -ID) %>%
group_by(ID) %>%
summarise(dupe = any(duplicated(value))) %>%
ungroup() %>%
summarise(final = sum(dupe)) %>%
pull(final)
#[1] 3
Or
df %>%
gather(key, value, -ID) %>%
group_by(ID) %>%
summarise(dupe = anyDuplicated(value)) %>%
ungroup() %>%
summarise(final = sum(dupe > 0)) %>%
pull(final)
#[1] 3
Upvotes: 3