Reputation: 25
I wish to calculate the unique values by row by group in r .The unique value by row should not include the blank cell. for e.g,
df<-data.frame(
Group=c("A1","A1","A1","A1","A1","B1","B1","B1"),
Segment=c("A",NA,"A","B","A",NA,"A","B")
)
INPUT:
+---------+--------+ | Group |Segment | +---------+--------+ | A1 |A | | A1 |NA | | A1 |A | | A1 |B | | A1 |A | | B1 |NA | | B1 |A | | B1 |B | +---------+--------+
I have used for loop in solving the problem but in the big dataset it is taking more time in getting the result.
Expected output in Distinct column
+---------+--------+----------+ | Group |Segment | distinct | +---------+--------+----------+ | A1 |A | 1 | | A1 |NA | 1 | | A1 |A | 1 | | A1 |B | 2 | | A1 |A | 2 | | B1 |NA | 0 | | B1 |A | 1 | | B1 |B | 1 | +---------+--------+----------+
Upvotes: 1
Views: 78
Reputation: 35262
duplicated
is useful for this, although the NAs make it a bit tricky:
library(dplyr)
df %>%
group_by(Group) %>%
mutate(distinct = cumsum(!duplicated(Segment) & !is.na(Segment)))
# A tibble: 8 x 3 # Groups: Group [2] Group Segment distinct <fct> <fct> <int> 1 A1 A 1 2 A1 NA 1 3 A1 A 1 4 A1 B 2 5 A1 A 2 6 B1 NA 0 7 B1 A 1 8 B1 B 2
Upvotes: 3