Reputation: 1773
I have a question on how to count unique values till certain point in time. For example, I want to know how many unique location a person has lived till that point.
created<- c(2009,2010,2010,2011, 2012, 2011)
person <- c(A, A, A, A, B, B)
location<- c('London','Geneva', 'London', 'New York', 'London', 'London')
df <- data.frame (created, person, location)
I want to create a variable called unique that takes into consideration how many distinct places he has lived till that point in time. I have tried the following. Any suggestions?
library(dplyr)
df %>% group_by(person, location) %>% arrange(Created,.by_group = TRUE) %>% mutate (unique=distinct (location))
unique <- c(1, 2, 2, 3,1,1)
Upvotes: 2
Views: 46
Reputation: 887541
We can use cummax
library(dplyr)
df %>%
group_by(person) %>%
mutate(unique = cummax(match(location, unique(location))))
# A tibble: 6 x 4
# Groups: person [2]
# created person location unique
# <dbl> <fct> <fct> <int>
#1 2009 A London 1
#2 2010 A Geneva 2
#3 2010 A London 2
#4 2011 A New York 3
#5 2012 B London 1
#6 2011 B London 1
Or with base R
df$unique <- with(df, ave(location, person, FUN =
function(x) cummax(match(x, unique(x)))))
df <- structure(list(created = c(2009, 2010, 2010, 2011, 2012, 2011
), person = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), location = structure(c(2L, 1L, 2L, 3L,
2L, 2L), .Label = c("Geneva", "London", "New York"), class = "factor")),
class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 0
Reputation: 389155
One way is to use cumsum
and duplicated
library(dplyr)
df %>% group_by(person) %>% mutate(unique = cumsum(!duplicated(location)))
# created person location unique
# <dbl> <fct> <fct> <int>
#1 2009 A London 1
#2 2010 A Geneva 2
#3 2010 A London 2
#4 2011 A New York 3
#5 2012 B London 1
#6 2011 B London 1
Upvotes: 1