Reputation: 3432
Hello I have a df such as
color type Nb
1 black chair 3
2 black chair 3
3 black chair 3
4 green sofa2 1
5 green sofa3 1
6 blue plate 2
7 blue plate 2
8 blue sofa 2
8 blue sofa 2
I would like to count within each color
the number of duplicated type
and add it to a Newcolumn
but only if nb>1
then I should get :
color type Nb Newcolumn
1 black chair 3 1
2 black chair 3 1
3 black chair 3 1
4 green sofa2 1 0
5 green sofa3 1 0
6 blue plate 2 2
7 blue plate 2 2
8 blue sofa 2 2
9 blue sofa 2 2
for instance within the group black
the wordchair
is duplicated and the nb
is >1
so I add 1
.
within the group green
there is no duplicated word so I add 0
.
within the group red
there are 2 words duplicated plate and sofa and their Nb > 1
then I add 2
to the Newcolumn
.
Does someone have an idea ?
I only know how to count the number of duplicated words within each groups by using :
tab %>%
group_by(color) %>%
mutate(Newcolumn = n_distinct(type))
Upvotes: 1
Views: 55
Reputation: 887048
We can use data.table
methods
library(data.table)
setDT(df)[, Newcolumn := 0][Nb > 1, Newcolumn := uniqueN(type), color]
-output
df
# color type Nb Newcolumn
#1: black chair 3 1
#2: black chair 3 1
#3: black chair 3 1
#4: green sofa2 1 0
#5: green sofa3 1 0
#6: blue plate 2 2
#7: blue plate 2 2
#8: blue sofa 2 2
#9: blue sofa 2 2
df <- structure(list(color = c("black", "black", "black", "green",
"green", "blue", "blue", "blue", "blue"), type = c("chair", "chair",
"chair", "sofa2", "sofa3", "plate", "plate", "sofa", "sofa"),
Nb = c(3L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 2L)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9"))
Upvotes: 1
Reputation: 4344
the dplyr::n_distinct() should do the trick:
library(dplyr)
df <- dplyr::tibble(color = c("black", "black", "black", "green", "green", "blue", "blue", "blue", "blue"),
type =c("chair", "chair", "chair", "sofa2", "sofa3", "plate", "plate", "sofa", "sofa"),
Nb = c(3,3,3,1,1,2,2,2,2))
# calculations
df %>%
dplyr::group_by(color) %>%
dplyr::mutate(distinct = dplyr::n_distinct(type)) %>%
dplyr::mutate(Newcolumn = ifelse(Nb > 1, distinct , 0))
color type Nb distinct Newcolumn
<chr> <chr> <dbl> <int> <dbl>
1 black chair 3 1 1
2 black chair 3 1 1
3 black chair 3 1 1
4 green sofa2 1 2 0
5 green sofa3 1 2 0
6 blue plate 2 2 2
7 blue plate 2 2 2
8 blue sofa 2 2 2
9 blue sofa 2 2 2
Upvotes: 1