Reputation: 79
I have a dataset that begins in year 1988 and ends in year 2020. I want to create averages for certain time intervals. For example, 5 years: 1988-1992, 1993-1997 and so on. But I want a new column with these averages.
For example, suppose I have this:
anos <- 1988:2020
valores <- c(15, 18, 20, NA, 25, 27, 28, NA, 32, 35, 36, 38, 40,
42, 45, 46, NA, 50, 52, 55, 56, 58, 60, NA, 65, 66,
68, 70, 72, 75, 76, 78, 80)
dataset <- data.frame(Ano = anos, Valor = valores)
I want to have this:
anos <- 1988:2020
valores <- c(15, 18, 20, NA, 25, 27, 28, NA, 32, 35, 36, 38, 40,
42, 45, 46, NA, 50, 52, 55, 56, 58, 60, NA, 65, 66,
68, 70, 72, 75, 76, 78, 80)
medias <- c(19.5, 19.5, 19.5, 19.5, 19.5,
30.5, 30.5, 30.5, 30.5, 30.5,
40.2, 40.2, 40.2, 40.2, 40.2,
50.75, 50.75, 50.75, 50.75, 50.75,
59.75, 59.75, 59.75, 59.75, 59.75,
70.2, 70.2, 70.2, 70.2, 70.2,
78, 78, 78)
dataset <- data.frame(Ano = anos, Valor = valores, Medias = medias)
There are five year averages for each time interval and the mean value repeats itself until the next five years. The last mean is an average of 3 values since the overall period is not a multiple of five.
EDIT: also, I will need to use group_by()
.
Upvotes: 0
Views: 54
Reputation: 33603
Base R:
mean_every_n <- function(x, n=5L) {
grp <- (seq_along(x) - 1 ) %/% n
ave(x, grp, FUN = \(y) mean(y, na.rm = TRUE))
}
dataset$medias <- mean_every_5(dataset$Valor)
> head(dataset, 11)
# Ano Valor medias
# 1 1988 15 19.5
# 2 1989 18 19.5
# 3 1990 20 19.5
# 4 1991 NA 19.5
# 5 1992 25 19.5
# 6 1993 27 30.5
# 7 1994 28 30.5
# 8 1995 NA 30.5
# 9 1996 32 30.5
# 10 1997 35 30.5
# 11 1998 36 40.2
Upvotes: 1
Reputation: 102625
You can try
dataset %>%
mutate(grp = ceiling(row_number() / 5)) %>%
mutate(Medias = mean(Valor, na.rm = TRUE), .by = grp) %>%
select(-grp)
which gives
Ano Valor Medias
1 1988 15 19.50
2 1989 18 19.50
3 1990 20 19.50
4 1991 NA 19.50
5 1992 25 19.50
6 1993 27 30.50
7 1994 28 30.50
8 1995 NA 30.50
9 1996 32 30.50
10 1997 35 30.50
11 1998 36 40.20
12 1999 38 40.20
13 2000 40 40.20
14 2001 42 40.20
15 2002 45 40.20
16 2003 46 50.75
17 2004 NA 50.75
18 2005 50 50.75
19 2006 52 50.75
20 2007 55 50.75
21 2008 56 59.75
22 2009 58 59.75
23 2010 60 59.75
24 2011 NA 59.75
25 2012 65 59.75
26 2013 66 70.20
27 2014 68 70.20
28 2015 70 70.20
29 2016 72 70.20
30 2017 75 70.20
31 2018 76 78.00
32 2019 78 78.00
33 2020 80 78.00
Upvotes: 0
Reputation: 19191
Using cut
to create the grouping
library(dplyr)
dataset %>%
group_by(grp = cut(Ano, seq(min(Ano), max(Ano), 5), right=F)) %>%
mutate(medias = mean(Valor, na.rm=T)) %>%
ungroup() %>%
select(-grp)
# A tibble: 33 × 3
Ano Valor medias
<int> <dbl> <dbl>
1 1988 15 19.5
2 1989 18 19.5
3 1990 20 19.5
4 1991 NA 19.5
5 1992 25 19.5
6 1993 27 30.5
7 1994 28 30.5
8 1995 NA 30.5
9 1996 32 30.5
10 1997 35 30.5
# ℹ 23 more rows
# ℹ Use `print(n = ...)` to see more rows
Upvotes: 0
Reputation: 389235
Here's a dplyr
version :
library(dplyr)
n <- 5
dataset %>%
arrange(Ano) %>%
group_by(Group = floor((Ano - first(Ano))/n)) %>%
mutate(Medias = mean(Valor, na.rm = TRUE)) %>%
data.frame()
which returns the following -
# Ano Valor Group Medias
#1 1988 15 0 19.50
#2 1989 18 0 19.50
#3 1990 20 0 19.50
#4 1991 NA 0 19.50
#5 1992 25 0 19.50
#6 1993 27 1 30.50
#7 1994 28 1 30.50
#8 1995 NA 1 30.50
#9 1996 32 1 30.50
#10 1997 35 1 30.50
#11 1998 36 2 40.20
#12 1999 38 2 40.20
#13 2000 40 2 40.20
#14 2001 42 2 40.20
#15 2002 45 2 40.20
#16 2003 46 3 50.75
#17 2004 NA 3 50.75
#18 2005 50 3 50.75
#19 2006 52 3 50.75
#20 2007 55 3 50.75
#21 2008 56 4 59.75
#22 2009 58 4 59.75
#23 2010 60 4 59.75
#24 2011 NA 4 59.75
#25 2012 65 4 59.75
#26 2013 66 5 70.20
#27 2014 68 5 70.20
#28 2015 70 5 70.20
#29 2016 72 5 70.20
#30 2017 75 5 70.20
#31 2018 76 6 78.00
#32 2019 78 6 78.00
#33 2020 80 6 78.00
In the example that you have shared you have a single entry for each year and data for all the year is present however, this will also work when :
Upvotes: 1
Reputation: 24845
With data.table
you can do as follows:
library(data.table)
setDT(dataset)[, Medias:=mean(Valor, na.rm=T), floor((Ano-3)/5)]
Output:
Ano Valor Medias
<int> <num> <num>
1: 1988 15 19.50
2: 1989 18 19.50
3: 1990 20 19.50
4: 1991 NA 19.50
5: 1992 25 19.50
6: 1993 27 30.50
7: 1994 28 30.50
8: 1995 NA 30.50
9: 1996 32 30.50
10: 1997 35 30.50
11: 1998 36 40.20
12: 1999 38 40.20
13: 2000 40 40.20
14: 2001 42 40.20
15: 2002 45 40.20
16: 2003 46 50.75
17: 2004 NA 50.75
18: 2005 50 50.75
19: 2006 52 50.75
20: 2007 55 50.75
21: 2008 56 59.75
22: 2009 58 59.75
23: 2010 60 59.75
24: 2011 NA 59.75
25: 2012 65 59.75
26: 2013 66 70.20
27: 2014 68 70.20
28: 2015 70 70.20
29: 2016 72 70.20
30: 2017 75 70.20
31: 2018 76 78.00
32: 2019 78 78.00
33: 2020 80 78.00
Ano Valor Medias
Upvotes: 1
Reputation: 10385
Take the ceiling remainder of division by 5
ave(
dataset$Valor,
ceiling(seq_along(rownames(dataset))/5),
FUN=function(x){mean(x,na.rm=T)}
)
Ano Valor Medias media
1 1988 15 19.50 19.50
2 1989 18 19.50 19.50
3 1990 20 19.50 19.50
4 1991 NA 19.50 19.50
5 1992 25 19.50 19.50
6 1993 27 30.50 30.50
7 1994 28 30.50 30.50
8 1995 NA 30.50 30.50
9 1996 32 30.50 30.50
10 1997 35 30.50 30.50
11 1998 36 40.20 40.20
12 1999 38 40.20 40.20
13 2000 40 40.20 40.20
14 2001 42 40.20 40.20
15 2002 45 40.20 40.20
16 2003 46 50.75 50.75
17 2004 NA 50.75 50.75
18 2005 50 50.75 50.75
19 2006 52 50.75 50.75
20 2007 55 50.75 50.75
21 2008 56 59.75 59.75
22 2009 58 59.75 59.75
23 2010 60 59.75 59.75
24 2011 NA 59.75 59.75
25 2012 65 59.75 59.75
26 2013 66 70.20 70.20
27 2014 68 70.20 70.20
28 2015 70 70.20 70.20
29 2016 72 70.20 70.20
30 2017 75 70.20 70.20
31 2018 76 78.00 78.00
32 2019 78 78.00 78.00
33 2020 80 78.00 78.00
Upvotes: 0