mean_34
mean_34

Reputation: 13

How to calculate mean age for tenure position having a number of different positions and people holding them?

I have a data frame with 4 variables: "position", "year start", "year finish", "year birth". I need to calculate a mean age for a person holding all position in a given year (sum of all ages in a given year over the number of ages, to be clear).

So, the final result must be like this:

year mean age
2000 45
2001 43
2002 34

Here is the example of data:

position year_start year_finish year_birth
A 2021 2024 1948
A 2013 2021 1945
A 2006 2012 1957
A 2002 2005 1956
A 1999 2001 1935
B 2007 2024 1959
B 2003 2006 1946
C 2016 2024 1955
C 2010 2015 1949
C 2007 2009 1952
C 2003 2006 1927

Upvotes: 1

Views: 91

Answers (2)

Jay Bee
Jay Bee

Reputation: 572

Probably a cleaner way to do it, but:

# Load libraries.

library(tidyverse)

# Load data.

df <- data.frame(
  position = c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C'),
  year_start = c(2021, 2013, 2006, 2002, 1999, 2007, 2003, 2016, 2010, 2007, 2003),
  year_finish = c(2024, 2021, 2012, 2005, 2001, 2024, 2006, 2024, 2015, 2009, 2006),
  year_birth = c(1948, 1945, 1957, 1956, 1935, 1959, 1946, 1955, 1949, 1952, 1927)
)

# Get list of years.

df2 <- df %>%
  mutate(year_list = mapply(seq, year_start, year_finish))

# Calculate ages based on that list of years.

df3 <- df2 %>% 
  mutate(y = map2(year_birth, year_list, function(x, y){
    y - x
    }))

# Generate mean age by year data.

df4 <- df3 %>% 
  select(year_list, y) %>%
  unnest(cols = c(year_list, y))

names(df4) <- c("year", "age")

mean_age_data <- df4 %>% 
  group_by(year) %>% 
  summarise(mean_age = mean(age))

mean_age_data

With output:

A tibble: 26 × 2
year    mean_age
<dbl>   <dbl>
1999    64.0
2000    65.0
2001    66.0
2002    46.0
2003    60.0
2004    61.0
2005    62.0
2006    62.7
2007    51.0
2008    52.0
2009    53.0
2010    55.0
2011    56.0
2012    57.0
2013    62.0
2014    63.0
2015    64.0
2016    63.0
2017    64.0
2018    65.0
2019    66.0
2020    67.0
2021    69.2
2022    68.0
2023    69.0
2024    70.0

Upvotes: 0

TarJae
TarJae

Reputation: 79194

We could do it this way:

library(dplyr)
library(tidyr)
library(purrr)

df %>% 
  mutate(year = map2(year_start, year_finish, seq),
         id = row_number()) %>% 
  unnest(year) %>% 
  group_by(position, year_start) %>% 
  mutate(mean_age = mean(year - year_birth, na.rm = TRUE)) %>% 
  select(-id, -year) %>% 
  slice(1) %>% 
  ungroup()
position year_start year_finish year_birth mean_age
   <chr>         <int>       <int>      <int>    <dbl>
 1 A              1999        2001       1935     65  
 2 A              2002        2005       1956     47.5
 3 A              2006        2012       1957     52  
 4 A              2013        2021       1945     72  
 5 A              2021        2024       1948     74.5
 6 B              2003        2006       1946     58.5
 7 B              2007        2024       1959     56.5
 8 C              2003        2006       1927     77.5
 9 C              2007        2009       1952     56  
10 C              2010        2015       1949     63.5
11 C              2016        2024       1955     65  

Upvotes: 0

Related Questions