Reputation: 391
I have a dataframe looks like below:
Year Person Office
2005 Peter Boston
2007 Peter Boston
2008 Peter Chicago
2009 Peter New York
2011 Peter New York
2003 Amy Seattle
2004 Amy Boston
2006 Amy Chicago
2007 Amy Chicago
I want to calculate a office-person level normalized measure (count) that captures the number of offices that a person experienced before coming to the current one. The measure is normalized by the total amount of years before arriving at the current position. Below is the ideal output. For Peter, Boston is his first office and thus, his normalized measure count for Boston is 0. For Peter, Chicago is his second office and he spent 2008-2005=3 years before coming to the Chicago office. Thus, his normalized measure count for Chicago is 1/3.
Office Person Count
Boston Peter 0
Boston Amy 1
Chicago Peter 1/3
Chicago Amy 2/3
New York Peter 1/2
Seattle Amy 0
Upvotes: 3
Views: 81
Reputation: 2623
library(tidyverse)
cities %>%
group_by(Person, Office) %>%
filter(row_number() == 1) %>%
group_by(Person) %>%
mutate(x = row_number()-1, y = (Year - Year[1])) %>%
mutate(count = ifelse(is.nan(x / y), x, x/y))
# Year Person Office x y test
# <int> <chr> <chr> <dbl> <int> <dbl>
# 1 2005 Peter "Boston" 0 0 0
# 2 2008 Peter "Chicago" 1 3 0.333
# 3 2009 Peter "New York" 2 4 0.5
# 4 2003 Amy "Seattle " 0 0 0
# 5 2004 Amy "Boston" 1 1 1
# 6 2006 Amy "Chicago" 2 3 0.667
If you want the count represented as a fraction, we can use a helper function from the package pracma
to reduce the fractions
cities %>%
group_by(Person, Office) %>%
filter(row_number() == 1) %>%
group_by(Person) %>%
mutate(x = row_number()-1, y = (Year - Year[1])) %>%
mutate(count = ifelse(is.nan(x / y), x, x/y)) %>%
mutate(frac = ifelse(x == 0,
0,
ifelse(x/y == 1, 1,
paste0(x / pracma::gcd(x,y), "/", y / pracma::gcd(x,y)))
)
) %>%
select(-x, -y)
# Year Person Office count frac
# <int> <chr> <chr> <dbl> <chr>
# 1 2005 Peter "Boston" 0 0
# 2 2008 Peter "Chicago" 0.333 1/3
# 3 2009 Peter "New York" 0.5 1/2
# 4 2003 Amy "Seattle " 0 0
# 5 2004 Amy "Boston" 1 1
# 6 2006 Amy "Chicago" 0.667 2/3
data:
cities <- read.delim(text = "Year,Person,Office
2005,Peter,Boston
2007,Peter,Boston
2008,Peter,Chicago
2009,Peter,New York
2011,Peter,New York
2003,Amy,Seattle
2004,Amy,Boston
2006,Amy,Chicago
2007,Amy,Chicago", sep = ",")
Upvotes: 1
Reputation: 16998
You could use
library(dplyr)
df %>%
group_by(Person, Office) %>%
slice_min(Year) %>%
arrange(Year) %>%
add_count() %>%
group_by(Person) %>%
mutate(Count = if_else(cumsum(n) == 1, 0, (cumsum(n) - 1) / (Year - first(Year))),
.keep = "unused") %>%
ungroup()
This returns
# A tibble: 6 x 3
Person Office Count
<chr> <chr> <dbl>
1 Amy Seattle 0
2 Amy Boston 1
3 Peter Boston 0
4 Amy Chicago 0.667
5 Peter Chicago 0.333
6 Peter New_York 0.5
Upvotes: 3