Shibaprasad
Shibaprasad

Reputation: 1332

How can I check if unique values of a column has multiple occurrences for different values of another column in R?

Sample data:

set.seed(4)
cl <- sample(LETTERS[1:2], 100, replace = T)
seller <- round(runif(100, min=1, max=80))

df <- data.frame(cl, seller)

    cl seller
1    B     21
2    A     51
3    A     22
4    A     43
5    A     38
6    B     46
7    A     54
8    B     18
9    A     78
.......

99   A     32
100  B      8

I want to check the number of times one unique value of seller has occurred for both A and B. Suppose, in the data frame with this particular seed, you'll see 7 has appeared for both A and B, so 7 will be counted.

My attempt:

df %>%
  filter(cl=='A')-> d1

df %>%
  filter(cl=='B')-> d2

d3 <- merge(d1, d2, by='seller') %>%
  distinct(seller)

nrow(d3)
17

So, 17 sellers have both cl: A and B.

My attempt, so far, has been very sub-optimal. It yields the result, but there has to be a better way with dplyr or even with base R which I can't figure out. Also, it will be very time-consuming for a bigger data set if I do it like this.

How can I solve this in a better, more tidy way?

Upvotes: 1

Views: 988

Answers (2)

akrun
akrun

Reputation: 886938

We could use n_distinct (assuming only 'A', 'B' values found in 'cl' column):

library(dplyr)
df %>%
    group_by(seller) %>%
    summarise(n = n_distinct(cl), .groups = 'drop') %>%
    filter(n == 2) %>%
    nrow

Output:

[1] 17

Or may also do

df %>%
    group_by(seller) %>%
    summarise(n = all(c("A", "B") %in% cl), .groups = 'drop') %>%
    pull(n) %>%
    sum
[1] 17

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388797

A base R approach using table, colSums and sum

sum(colSums(table(df) > 0) == 2)
#[1] 17

Upvotes: 2

Related Questions