Reputation: 21
I have a data frame df that looks like this
ID Color Street Apt Year
1 blue 14 8 1986
1 red 14 9 1986
2 blue 10 1 1990
3 pink 70 5 1970
3 blue 70 6 1970
3 blue 17 2 1970
I want to create a new column that shows the number of unique values in color street and apt, per ID.
ID Color Street Apt Year Count
1 blue 14 8 1986 5
1 red 14 9 1986 5
2 blue 10 1 1990 3
3 pink 70 5 1970 7
3 blue 70 6 1970 7
3 blue 17 2 1970 7
I have tried several ways but either fail to loop through the rows, getting the same value repeated down the count coulmn. I have tried as.character(), as one of the columns is non-numeric.
I think the ideal solution would be to populate a list with values from the 3 columns per id, then count unique values in that list, but I am not sure how to do this in R. Seems simpler in Python
Upvotes: 2
Views: 701
Reputation: 887048
After grouping by 'ID' we get the n_distinct
of 'Color', 'Street', 'Apt' and sum
it together to mutate
with the 'Count' column
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(Count = sum(c(n_distinct(Color), n_distinct(Street), n_distinct(Apt))))
# A tibble: 6 x 6
# Groups: ID [3]
# ID Color Street Apt Year Count
# <int> <chr> <int> <int> <int> <int>
#1 1 blue 142 8 1986 5
#2 1 red 142 9 1986 5
#3 2 blue 102 1 1990 3
#4 3 pink 70 5 1970 7
#5 3 blue 70 20 1970 7
#6 3 blue 107 2 1970 7
Upvotes: 3