Reputation: 1332
Sample data:
set.seed(4)
cl <- sample(LETTERS[1:2], 100, replace = T)
seller <- round(runif(100, min=1, max=80))
df <- data.frame(cl, seller)
cl seller
1 B 21
2 A 51
3 A 22
4 A 43
5 A 38
6 B 46
7 A 54
8 B 18
9 A 78
.......
99 A 32
100 B 8
I want to check the number of times one unique value of seller
has occurred for both A
and B
. Suppose, in the data frame with this particular seed, you'll see 7 has appeared for both A and B, so 7 will be counted.
My attempt:
df %>%
filter(cl=='A')-> d1
df %>%
filter(cl=='B')-> d2
d3 <- merge(d1, d2, by='seller') %>%
distinct(seller)
nrow(d3)
17
So, 17 sellers have both cl: A and B.
My attempt, so far, has been very sub-optimal. It yields the result, but there has to be a better way with dplyr
or even with base R
which I can't figure out. Also, it will be very time-consuming for a bigger data set if I do it like this.
How can I solve this in a better, more tidy way?
Upvotes: 1
Views: 988
Reputation: 886938
We could use n_distinct
(assuming only 'A', 'B' values found in 'cl' column):
library(dplyr)
df %>%
group_by(seller) %>%
summarise(n = n_distinct(cl), .groups = 'drop') %>%
filter(n == 2) %>%
nrow
Output:
[1] 17
Or may also do
df %>%
group_by(seller) %>%
summarise(n = all(c("A", "B") %in% cl), .groups = 'drop') %>%
pull(n) %>%
sum
[1] 17
Upvotes: 2
Reputation: 388797
A base R approach using table
, colSums
and sum
sum(colSums(table(df) > 0) == 2)
#[1] 17
Upvotes: 2