Prasanna
Prasanna

Reputation: 25

Count the distinct value by row by group

I wish to calculate the unique values by row by group in r .The unique value by row should not include the blank cell. for e.g,

df<-data.frame(
  Group=c("A1","A1","A1","A1","A1","B1","B1","B1"),
  Segment=c("A",NA,"A","B","A",NA,"A","B")
)

INPUT:

 
+---------+--------+
| Group   |Segment |
+---------+--------+
| A1      |A       |
| A1      |NA      |
| A1      |A       |
| A1      |B       |
| A1      |A       |
| B1      |NA      |
| B1      |A       |
| B1      |B       |
+---------+--------+

I have used for loop in solving the problem but in the big dataset it is taking more time in getting the result.

Expected output in Distinct column

 
+---------+--------+----------+
| Group   |Segment | distinct |
+---------+--------+----------+
| A1      |A       |    1     |
| A1      |NA      |    1     |
| A1      |A       |    1     |
| A1      |B       |    2     |
| A1      |A       |    2     |
| B1      |NA      |    0     |
| B1      |A       |    1     |
| B1      |B       |    1     |
+---------+--------+----------+

Upvotes: 1

Views: 78

Answers (1)

Axeman
Axeman

Reputation: 35262

duplicated is useful for this, although the NAs make it a bit tricky:

library(dplyr)
df %>% 
  group_by(Group) %>% 
  mutate(distinct = cumsum(!duplicated(Segment) & !is.na(Segment)))
# A tibble: 8 x 3
# Groups:   Group [2]
  Group Segment distinct
  <fct> <fct>      <int>
1 A1    A              1
2 A1    NA             1
3 A1    A              1
4 A1    B              2
5 A1    A              2
6 B1    NA             0
7 B1    A              1
8 B1    B              2

Upvotes: 3

Related Questions