chippycentra
chippycentra

Reputation: 3432

Using dplyr to create a new column and count duplicate with threshold

Hello I have a df such as

  color  type Nb 
1 black chair 3
2 black chair 3 
3 black chair 3
4 green sofa2 1
5 green sofa3 1
6  blue plate 2
7  blue plate 2
8  blue sofa  2
8  blue sofa  2

I would like to count within each color the number of duplicated type and add it to a Newcolumn but only if nb>1

then I should get :

  color  type Nb Newcolumn
1 black chair 3  1
2 black chair 3  1
3 black chair 3  1
4 green sofa2 1  0
5 green sofa3 1  0
6 blue  plate 2  2
7 blue  plate 2  2
8 blue  sofa  2  2
9 blue  sofa  2  2

for instance within the group black the wordchair is duplicated and the nb is >1 so I add 1.

within the group green there is no duplicated word so I add 0.

within the group red there are 2 words duplicated plate and sofa and their Nb > 1 then I add 2 to the Newcolumn.

Does someone have an idea ?

I only know how to count the number of duplicated words within each groups by using :

tab %>%
  group_by(color) %>%
  mutate(Newcolumn = n_distinct(type))

Upvotes: 1

Views: 55

Answers (2)

akrun
akrun

Reputation: 887048

We can use data.table methods

library(data.table)
setDT(df)[, Newcolumn := 0][Nb > 1, Newcolumn := uniqueN(type), color]

-output

df
#   color  type Nb Newcolumn
#1: black chair  3         1
#2: black chair  3         1
#3: black chair  3         1
#4: green sofa2  1         0
#5: green sofa3  1         0
#6:  blue plate  2         2
#7:  blue plate  2         2
#8:  blue  sofa  2         2
#9:  blue  sofa  2         2

data

df <- structure(list(color = c("black", "black", "black", "green", 
"green", "blue", "blue", "blue", "blue"), type = c("chair", "chair", 
"chair", "sofa2", "sofa3", "plate", "plate", "sofa", "sofa"), 
    Nb = c(3L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 2L)), class = "data.frame",
    row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9"))

Upvotes: 1

DPH
DPH

Reputation: 4344

the dplyr::n_distinct() should do the trick:

library(dplyr)

df <- dplyr::tibble(color = c("black", "black", "black", "green", "green", "blue", "blue", "blue", "blue"),
                type =c("chair", "chair", "chair", "sofa2", "sofa3", "plate", "plate", "sofa", "sofa"),
                Nb = c(3,3,3,1,1,2,2,2,2))


# calculations
df %>%
  dplyr::group_by(color) %>% 
  dplyr::mutate(distinct = dplyr::n_distinct(type)) %>% 
  dplyr::mutate(Newcolumn = ifelse(Nb > 1, distinct , 0))

   color type     Nb distinct Newcolumn
  <chr> <chr> <dbl>  <int>     <dbl>
1 black chair     3      1         1
2 black chair     3      1         1
3 black chair     3      1         1
4 green sofa2     1      2         0
5 green sofa3     1      2         0
6 blue  plate     2      2         2
7 blue  plate     2      2         2
8 blue  sofa      2      2         2
9 blue  sofa      2      2         2

Upvotes: 1

Related Questions