Pedro Cardoso
Pedro Cardoso

Reputation: 79

Create a new column with averages for time intervals

I have a dataset that begins in year 1988 and ends in year 2020. I want to create averages for certain time intervals. For example, 5 years: 1988-1992, 1993-1997 and so on. But I want a new column with these averages.

For example, suppose I have this:

anos <- 1988:2020
valores <- c(15, 18, 20, NA, 25, 27, 28, NA, 32, 35, 36, 38, 40, 
             42, 45, 46, NA, 50, 52, 55, 56, 58, 60, NA, 65, 66, 
             68, 70, 72, 75, 76, 78, 80)

dataset <- data.frame(Ano = anos, Valor = valores)

I want to have this:

anos <- 1988:2020

valores <- c(15, 18, 20, NA, 25, 27, 28, NA, 32, 35, 36, 38, 40, 
             42, 45, 46, NA, 50, 52, 55, 56, 58, 60, NA, 65, 66, 
             68, 70, 72, 75, 76, 78, 80)

medias <- c(19.5, 19.5, 19.5, 19.5, 19.5,
            30.5, 30.5, 30.5, 30.5, 30.5,
            40.2, 40.2, 40.2, 40.2, 40.2,
            50.75, 50.75, 50.75, 50.75, 50.75,
            59.75, 59.75, 59.75, 59.75, 59.75,
            70.2, 70.2, 70.2, 70.2, 70.2,
            78, 78, 78)
            
dataset <- data.frame(Ano = anos, Valor = valores, Medias = medias)

There are five year averages for each time interval and the mean value repeats itself until the next five years. The last mean is an average of 3 values since the overall period is not a multiple of five.

EDIT: also, I will need to use group_by().

Upvotes: 0

Views: 54

Answers (6)

s_baldur
s_baldur

Reputation: 33603

Base R:

mean_every_n <- function(x, n=5L) {
  grp <- (seq_along(x) - 1 ) %/% n
  ave(x, grp, FUN = \(y) mean(y, na.rm = TRUE))
}
dataset$medias <- mean_every_5(dataset$Valor)


> head(dataset, 11)
#     Ano Valor medias
# 1  1988    15   19.5
# 2  1989    18   19.5
# 3  1990    20   19.5
# 4  1991    NA   19.5
# 5  1992    25   19.5
# 6  1993    27   30.5
# 7  1994    28   30.5
# 8  1995    NA   30.5
# 9  1996    32   30.5
# 10 1997    35   30.5
# 11 1998    36   40.2

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 102625

You can try

dataset %>%
    mutate(grp = ceiling(row_number() / 5)) %>%
    mutate(Medias = mean(Valor, na.rm = TRUE), .by = grp) %>%
    select(-grp)

which gives

    Ano Valor Medias
1  1988    15  19.50
2  1989    18  19.50
3  1990    20  19.50
4  1991    NA  19.50
5  1992    25  19.50
6  1993    27  30.50
7  1994    28  30.50
8  1995    NA  30.50
9  1996    32  30.50
10 1997    35  30.50
11 1998    36  40.20
12 1999    38  40.20
13 2000    40  40.20
14 2001    42  40.20
15 2002    45  40.20
16 2003    46  50.75
17 2004    NA  50.75
18 2005    50  50.75
19 2006    52  50.75
20 2007    55  50.75
21 2008    56  59.75
22 2009    58  59.75
23 2010    60  59.75
24 2011    NA  59.75
25 2012    65  59.75
26 2013    66  70.20
27 2014    68  70.20
28 2015    70  70.20
29 2016    72  70.20
30 2017    75  70.20
31 2018    76  78.00
32 2019    78  78.00
33 2020    80  78.00

Upvotes: 0

Andre Wildberg
Andre Wildberg

Reputation: 19191

Using cut to create the grouping

library(dplyr)

dataset %>% 
  group_by(grp = cut(Ano, seq(min(Ano), max(Ano), 5), right=F)) %>% 
  mutate(medias = mean(Valor, na.rm=T)) %>% 
  ungroup() %>% 
  select(-grp)
# A tibble: 33 × 3
     Ano Valor medias
   <int> <dbl>  <dbl>
 1  1988    15   19.5
 2  1989    18   19.5
 3  1990    20   19.5
 4  1991    NA   19.5
 5  1992    25   19.5
 6  1993    27   30.5
 7  1994    28   30.5
 8  1995    NA   30.5
 9  1996    32   30.5
10  1997    35   30.5
# ℹ 23 more rows
# ℹ Use `print(n = ...)` to see more rows

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389235

Here's a dplyr version :

library(dplyr)
n <- 5

dataset %>%
  arrange(Ano) %>%
  group_by(Group = floor((Ano - first(Ano))/n)) %>%
  mutate(Medias = mean(Valor, na.rm = TRUE)) %>%
  data.frame()

which returns the following -

#    Ano Valor Group Medias
#1  1988    15     0  19.50
#2  1989    18     0  19.50
#3  1990    20     0  19.50
#4  1991    NA     0  19.50
#5  1992    25     0  19.50
#6  1993    27     1  30.50
#7  1994    28     1  30.50
#8  1995    NA     1  30.50
#9  1996    32     1  30.50
#10 1997    35     1  30.50
#11 1998    36     2  40.20
#12 1999    38     2  40.20
#13 2000    40     2  40.20
#14 2001    42     2  40.20
#15 2002    45     2  40.20
#16 2003    46     3  50.75
#17 2004    NA     3  50.75
#18 2005    50     3  50.75
#19 2006    52     3  50.75
#20 2007    55     3  50.75
#21 2008    56     4  59.75
#22 2009    58     4  59.75
#23 2010    60     4  59.75
#24 2011    NA     4  59.75
#25 2012    65     4  59.75
#26 2013    66     5  70.20
#27 2014    68     5  70.20
#28 2015    70     5  70.20
#29 2016    72     5  70.20
#30 2017    75     5  70.20
#31 2018    76     6  78.00
#32 2019    78     6  78.00
#33 2020    80     6  78.00

In the example that you have shared you have a single entry for each year and data for all the year is present however, this will also work when :

  1. You have more than 1 rows for a year.
  2. You don't have data for all the years.

Upvotes: 1

langtang
langtang

Reputation: 24845

With data.table you can do as follows:

library(data.table)
setDT(dataset)[, Medias:=mean(Valor, na.rm=T), floor((Ano-3)/5)]

Output:

      Ano Valor Medias
    <int> <num>  <num>
 1:  1988    15  19.50
 2:  1989    18  19.50
 3:  1990    20  19.50
 4:  1991    NA  19.50
 5:  1992    25  19.50
 6:  1993    27  30.50
 7:  1994    28  30.50
 8:  1995    NA  30.50
 9:  1996    32  30.50
10:  1997    35  30.50
11:  1998    36  40.20
12:  1999    38  40.20
13:  2000    40  40.20
14:  2001    42  40.20
15:  2002    45  40.20
16:  2003    46  50.75
17:  2004    NA  50.75
18:  2005    50  50.75
19:  2006    52  50.75
20:  2007    55  50.75
21:  2008    56  59.75
22:  2009    58  59.75
23:  2010    60  59.75
24:  2011    NA  59.75
25:  2012    65  59.75
26:  2013    66  70.20
27:  2014    68  70.20
28:  2015    70  70.20
29:  2016    72  70.20
30:  2017    75  70.20
31:  2018    76  78.00
32:  2019    78  78.00
33:  2020    80  78.00
      Ano Valor Medias

Upvotes: 1

user2974951
user2974951

Reputation: 10385

Take the ceiling remainder of division by 5

ave(
  dataset$Valor,
  ceiling(seq_along(rownames(dataset))/5),
  FUN=function(x){mean(x,na.rm=T)}
)

    Ano Valor Medias media
1  1988    15  19.50 19.50
2  1989    18  19.50 19.50
3  1990    20  19.50 19.50
4  1991    NA  19.50 19.50
5  1992    25  19.50 19.50
6  1993    27  30.50 30.50
7  1994    28  30.50 30.50
8  1995    NA  30.50 30.50
9  1996    32  30.50 30.50
10 1997    35  30.50 30.50
11 1998    36  40.20 40.20
12 1999    38  40.20 40.20
13 2000    40  40.20 40.20
14 2001    42  40.20 40.20
15 2002    45  40.20 40.20
16 2003    46  50.75 50.75
17 2004    NA  50.75 50.75
18 2005    50  50.75 50.75
19 2006    52  50.75 50.75
20 2007    55  50.75 50.75
21 2008    56  59.75 59.75
22 2009    58  59.75 59.75
23 2010    60  59.75 59.75
24 2011    NA  59.75 59.75
25 2012    65  59.75 59.75
26 2013    66  70.20 70.20
27 2014    68  70.20 70.20
28 2015    70  70.20 70.20
29 2016    72  70.20 70.20
30 2017    75  70.20 70.20
31 2018    76  78.00 78.00
32 2019    78  78.00 78.00
33 2020    80  78.00 78.00

Upvotes: 0

Related Questions