Ümit Seven
Ümit Seven

Reputation: 9

Aggregate daily data with means

I have a large dataset as sampled below, and I want to convert daily data to monthly while having the averages of total_cases_per_million and stringency_index.

head(datacovid)
 location                 date                         total_cases_per_million    stringency_index
 Afghanistan              24/02/2020                   0.026                      8.33
 Colombia                 25/02/2020                   0.026                      8.33
 Democratic Republic of Congo 26/02/2020               0.026                      8.33
 India                    27/02/2020                   0.026                      8.33
 Iraq                     28/02/2020                   0.026                      8.33
 Lebanon                  29/02/2020                   0.026                      8.33

The structure is as follows:

str(datacovid) 'data.frame': 78444 obs. of 4 variables: $ location : chr "Afghanistan" "Colombia" "Democratic Republic of Congo" "India" ... $ date : Factor w/ 455 levels "01/01/2020","01/01/2021",..: 348 363 378 393 408 423 5 20 35 50 ... $ total_cases_per_million: num 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.051 0.103 ... $ stringency_index : num 8.33 8.33 8.33 8.33 8.33 ...

I have attemped to use "mutate" but failed. Many thanks in advance. P.S. I am a very R beginner.

Upvotes: 0

Views: 727

Answers (2)

You can easily do that using lubridate to transform the date column into date format and tidyverse to group and summarise. Here's the commented code:

library(tidyverse)
library(lubridate)

df %>%
  # Transform date column into date format
  mutate_at(vars(date), function(x) dmy(as.character(x))) %>%
  # Extract month from date
  mutate(var_month = month(date)) %>%
  # Group by month
  group_by(var_month) %>%
  # Calculate the average for cases and stringency, grouped by month
  summarise(avg_monthly_cases = mean(total_cases_per_million),
            avg_stringency = mean(stringency_index))

# A tibble: 1 x 3
#  var_month avg_monthly_cases avg_stringency
#      <dbl>             <dbl>          <dbl>
#1         2             0.026           8.33

Upvotes: 0

Eyayaw
Eyayaw

Reputation: 1081

The main thing is to group the data by month. Converting the date variable into date type would help.

df = read.table(text = r'{location                 date                         total_cases_per_million    stringency_index
 Afghanistan              24/02/2020                   0.026                      8.33
 Colombia                 25/02/2020                   0.026                      8.33
 'Democratic Republic of Congo' 26/02/2020               0.026                      8.33
 India                    27/02/2020                   0.026                      8.33
 Iraq                     28/02/2020                   0.026                      8.33
 Lebanon                  29/02/2020                   0.026                      8.33}', header=T)

df$date = as.Date(df$date, "%d/%m/%Y")
df$month  = format(df$date, "%b")
aggregate(cbind(total_cases_per_million, stringency_index) ~ month, df, mean)

# month total_cases_per_million stringency_index
# 1   Feb                   0.026             8.33

Upvotes: 0

Related Questions