Reputation: 13
I have a data frame with 4 variables: "position", "year start", "year finish", "year birth". I need to calculate a mean age for a person holding all position in a given year (sum of all ages in a given year over the number of ages, to be clear).
So, the final result must be like this:
year | mean age |
---|---|
2000 | 45 |
2001 | 43 |
2002 | 34 |
Here is the example of data:
position | year_start | year_finish | year_birth |
---|---|---|---|
A | 2021 | 2024 | 1948 |
A | 2013 | 2021 | 1945 |
A | 2006 | 2012 | 1957 |
A | 2002 | 2005 | 1956 |
A | 1999 | 2001 | 1935 |
B | 2007 | 2024 | 1959 |
B | 2003 | 2006 | 1946 |
C | 2016 | 2024 | 1955 |
C | 2010 | 2015 | 1949 |
C | 2007 | 2009 | 1952 |
C | 2003 | 2006 | 1927 |
Upvotes: 1
Views: 91
Reputation: 572
Probably a cleaner way to do it, but:
# Load libraries.
library(tidyverse)
# Load data.
df <- data.frame(
position = c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C'),
year_start = c(2021, 2013, 2006, 2002, 1999, 2007, 2003, 2016, 2010, 2007, 2003),
year_finish = c(2024, 2021, 2012, 2005, 2001, 2024, 2006, 2024, 2015, 2009, 2006),
year_birth = c(1948, 1945, 1957, 1956, 1935, 1959, 1946, 1955, 1949, 1952, 1927)
)
# Get list of years.
df2 <- df %>%
mutate(year_list = mapply(seq, year_start, year_finish))
# Calculate ages based on that list of years.
df3 <- df2 %>%
mutate(y = map2(year_birth, year_list, function(x, y){
y - x
}))
# Generate mean age by year data.
df4 <- df3 %>%
select(year_list, y) %>%
unnest(cols = c(year_list, y))
names(df4) <- c("year", "age")
mean_age_data <- df4 %>%
group_by(year) %>%
summarise(mean_age = mean(age))
mean_age_data
With output:
A tibble: 26 × 2
year mean_age
<dbl> <dbl>
1999 64.0
2000 65.0
2001 66.0
2002 46.0
2003 60.0
2004 61.0
2005 62.0
2006 62.7
2007 51.0
2008 52.0
2009 53.0
2010 55.0
2011 56.0
2012 57.0
2013 62.0
2014 63.0
2015 64.0
2016 63.0
2017 64.0
2018 65.0
2019 66.0
2020 67.0
2021 69.2
2022 68.0
2023 69.0
2024 70.0
Upvotes: 0
Reputation: 79194
We could do it this way:
library(dplyr)
library(tidyr)
library(purrr)
df %>%
mutate(year = map2(year_start, year_finish, seq),
id = row_number()) %>%
unnest(year) %>%
group_by(position, year_start) %>%
mutate(mean_age = mean(year - year_birth, na.rm = TRUE)) %>%
select(-id, -year) %>%
slice(1) %>%
ungroup()
position year_start year_finish year_birth mean_age
<chr> <int> <int> <int> <dbl>
1 A 1999 2001 1935 65
2 A 2002 2005 1956 47.5
3 A 2006 2012 1957 52
4 A 2013 2021 1945 72
5 A 2021 2024 1948 74.5
6 B 2003 2006 1946 58.5
7 B 2007 2024 1959 56.5
8 C 2003 2006 1927 77.5
9 C 2007 2009 1952 56
10 C 2010 2015 1949 63.5
11 C 2016 2024 1955 65
Upvotes: 0