user3570187
user3570187

Reputation: 1773

creating distinct values column till certain time

I have a question on how to count unique values till certain point in time. For example, I want to know how many unique location a person has lived till that point.

 created<- c(2009,2010,2010,2011, 2012, 2011)
 person <- c(A, A, A, A, B, B)
 location<- c('London','Geneva', 'London', 'New York', 'London', 'London')
 df <- data.frame (created, person, location)

I want to create a variable called unique that takes into consideration how many distinct places he has lived till that point in time. I have tried the following. Any suggestions?

  library(dplyr) 
   df %>% group_by(person, location) %>% arrange(Created,.by_group = TRUE) %>% mutate (unique=distinct (location))

  unique <- c(1, 2, 2, 3,1,1)

Upvotes: 2

Views: 46

Answers (2)

akrun
akrun

Reputation: 887541

We can use cummax

library(dplyr)
df %>% 
   group_by(person) %>% 
   mutate(unique = cummax(match(location, unique(location))))
# A tibble: 6 x 4
# Groups:   person [2]
#  created person location unique
#    <dbl> <fct>  <fct>     <int>
#1    2009 A      London        1
#2    2010 A      Geneva        2
#3    2010 A      London        2
#4    2011 A      New York      3
#5    2012 B      London        1
#6    2011 B      London        1

Or with base R

df$unique <- with(df, ave(location, person, FUN =
          function(x) cummax(match(x, unique(x)))))

data

df <- structure(list(created = c(2009, 2010, 2010, 2011, 2012, 2011
), person = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), location = structure(c(2L, 1L, 2L, 3L, 
2L, 2L), .Label = c("Geneva", "London", "New York"), class = "factor")),
class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389155

One way is to use cumsum and duplicated

library(dplyr)
df %>% group_by(person) %>% mutate(unique = cumsum(!duplicated(location)))

#  created person location unique
#    <dbl> <fct>  <fct>     <int>
#1    2009 A      London        1
#2    2010 A      Geneva        2
#3    2010 A      London        2
#4    2011 A      New York      3
#5    2012 B      London        1
#6    2011 B      London        1

Upvotes: 1

Related Questions