chippycentra
chippycentra

Reputation: 3432

Count number of different values within columns groupby r

I have a df such as:

COL1 COL2 
G1   SP1  
G1   SP2  
G1   SP1  
G2   SP1  
G2   SP6  
G2   SP9  
G3   SP9  
G4   SP9  
G4   SP2  

and I would like to add a COL3 which is by group of COL1 the number of different COL2 values:

COL1 COL2 COL3
G1   SP1  2
G1   SP2  2
G1   SP1  2
G2   SP1  3
G2   SP6  3
G2   SP9  3
G3   SP9  1
G4   SP9  2
G4   SP2  2

Upvotes: 1

Views: 55

Answers (3)

akrun
akrun

Reputation: 886938

Another option with data.table

library(data.table)
setDT(df)[, COL3 := uniqueN(COL2), COL1]

Upvotes: 0

Agaz Wani
Agaz Wani

Reputation: 5684

Another way could be

df %>% 
  group_by(COL1) %>%
  mutate(COL3 = length(unique(c_across(COL2))))

# A tibble: 9 x 3
# Groups:   COL1 [4]
  COL1  COL2   COL3
  <chr> <chr> <int>
1 G1    SP1       2
2 G1    SP2       2
3 G1    SP1       2
4 G2    SP1       3
5 G2    SP6       3
6 G2    SP9       3
7 G3    SP9       1
8 G4    SP9       2
9 G4    SP2       2

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 173793

You can use n_distinct(COL2) after grouping by COL1:

library(dplyr)

df %>% 
  group_by(COL1) %>% 
  mutate(COL3 = n_distinct(COL2))

#> # A tibble: 9 x 3
#> # Groups:   COL1 [4]
#>   COL1  COL2   COL3
#>   <fct> <fct> <int>
#> 1 G1    SP1       2
#> 2 G1    SP2       2
#> 3 G1    SP1       2
#> 4 G2    SP1       3
#> 5 G2    SP6       3
#> 6 G2    SP9       3
#> 7 G3    SP9       1
#> 8 G4    SP9       2
#> 9 G4    SP2       2

Upvotes: 2

Related Questions