Reputation: 87
I am looking to create a new data frame of averages based off a population data frame. Column A is a list of ages followed by the population of each for a number of years. I am wondering what the best approach to creating a new data frame with the average age for each year. The formula I am using to calculate the average is sum(age*year)/sum(year)
For example this is what I have:
Age | 2000 | 2001 | 2002
------------------------
2 | 4 | 1 | 2
3 | 6 | 3 | 5
4 | 10 | 9 | 8
and I would like a new data frame like this:
Year| Avg_age
--------------
2000 | 3.3
2001 | 3.6
2002 | 3.4
Upvotes: 2
Views: 866
Reputation: 887118
We can reshape to 'long' format and get the summarise
d output grouped by 'Year'
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = -Age, names_to = 'Year') %>%
group_by(Year) %>%
summarise(Avg_age = sum(Age * value)/sum(value), .groups = 'drop')
-output
# A tibble: 3 x 2
# Year Avg_age
#* <chr> <dbl>
#1 2000 3.3
#2 2001 3.62
#3 2002 3.4
Or it can be done in reverse as well i.e. first get the summarised output and then do the reshaping
df1 %>%
summarise(across(-Age, ~ sum(Age * .)/sum(.))) %>%
pivot_longer(cols = everything(), names_to = 'Year',
values_to = 'Avg_age')
# A tibble: 3 x 2
# Year Avg_age
# <chr> <dbl>
#1 2000 3.3
#2 2001 3.62
#3 2002 3.4
Or using dapply
with stack
library(collapse)
stack(dapply(df1[-1], function(x) sum(x * df1$Age)/sum(x)))[2:1]
df1 <- structure(list(Age = 2:4, `2000` = c(4, 6, 10), `2001` = c(1,
3, 9), `2002` = c(2, 5, 8)), row.names = c(NA, -3L), class = "data.frame")
Upvotes: 2