codekoala
codekoala

Reputation: 21

How can I create a new column with a count of unique values in df by group in r

I have a data frame df that looks like this

ID   Color   Street   Apt   Year  
1    blue      14      8    1986    
1    red       14      9    1986    
2    blue      10      1    1990    
3    pink      70      5    1970    
3    blue      70      6    1970    
3    blue      17      2    1970    

I want to create a new column that shows the number of unique values in color street and apt, per ID.

ID   Color   Street   Apt   Year  Count
1    blue      14      8    1986    5
1    red       14      9    1986    5
2    blue      10      1    1990    3
3    pink      70      5    1970    7
3    blue      70      6    1970    7
3    blue      17      2    1970    7

I have tried several ways but either fail to loop through the rows, getting the same value repeated down the count coulmn. I have tried as.character(), as one of the columns is non-numeric.

I think the ideal solution would be to populate a list with values from the 3 columns per id, then count unique values in that list, but I am not sure how to do this in R. Seems simpler in Python

Upvotes: 2

Views: 701

Answers (1)

akrun
akrun

Reputation: 887048

After grouping by 'ID' we get the n_distinct of 'Color', 'Street', 'Apt' and sum it together to mutate with the 'Count' column

library(dplyr)
df1 %>% 
    group_by(ID) %>%
    mutate(Count = sum(c(n_distinct(Color), n_distinct(Street), n_distinct(Apt))))
# A tibble: 6 x 6
# Groups:   ID [3]
#     ID Color Street   Apt  Year Count
#   <int> <chr>  <int> <int> <int> <int>
#1     1  blue    142     8  1986     5
#2     1   red    142     9  1986     5
#3     2  blue    102     1  1990     3
#4     3  pink     70     5  1970     7
#5     3  blue     70    20  1970     7
#6     3  blue    107     2  1970     7

Upvotes: 3

Related Questions