R Dataframe Regroup and Summarize

Question

I have a dataframe looks like below:

Year   Person   Office
2005   Peter    Boston 
2007   Peter    Boston
2008   Peter    Chicago 
2009   Peter    New York
2011   Peter    New York 
2003   Amy      Seattle 
2004   Amy      Boston 
2006   Amy      Chicago 
2007   Amy      Chicago

I want to calculate a office-person level normalized measure (count) that captures the number of offices that a person experienced before coming to the current one. The measure is normalized by the total amount of years before arriving at the current position. Below is the ideal output. For Peter, Boston is his first office and thus, his normalized measure count for Boston is 0. For Peter, Chicago is his second office and he spent 2008-2005=3 years before coming to the Chicago office. Thus, his normalized measure count for Chicago is 1/3.

Office    Person  Count
Boston    Peter   0
Boston    Amy     1
Chicago   Peter   1/3
Chicago   Amy     2/3
New York  Peter   1/2
Seattle   Amy     0

Martin Gal · Accepted Answer

You could use

library(dplyr)

df %>% 
  group_by(Person, Office) %>% 
  slice_min(Year) %>% 
  arrange(Year) %>% 
  add_count() %>% 
  group_by(Person) %>% 
  mutate(Count = if_else(cumsum(n) == 1, 0, (cumsum(n) - 1) / (Year - first(Year))),
         .keep = "unused") %>% 
  ungroup()

This returns

# A tibble: 6 x 3
  Person Office   Count
        
1 Amy    Seattle  0    
2 Amy    Boston   1    
3 Peter  Boston   0    
4 Amy    Chicago  0.667
5 Peter  Chicago  0.333
6 Peter  New_York 0.5

R Dataframe Regroup and Summarize

Answers (2)

Related Questions