Ümit Seven
Ümit Seven

Reputation: 9

Grouping daily data by month with means

I have a large dataset as sampled below, and I want to convert daily data for each country to monthly while having the averages of total_cases_per_million and stringency_index.

head(datacovid)
 location                 date                         total_cases_per_million    stringency_index
 Afghanistan              24/02/2020                   0.026                      8.33
 Colombia                 25/02/2020                   0.026                      8.33
 Democratic Republic of Congo 26/02/2020               0.026                      8.33
 India                    27/02/2020                   0.026                      8.33
 Iraq                     28/02/2020                   0.026                      8.33
 Lebanon                  29/02/2020                   0.026                      8.33

The structure is as follows:

str(datacovid) 'data.frame': 78444 obs. of 4 variables: $ location : chr "Afghanistan" "Colombia" "Democratic Republic of Congo" "India" ... $ date : Factor w/ 455 levels "01/01/2020","01/01/2021",..: 348 363 378 393 408 423 5 20 35 50 ... $ total_cases_per_million: num 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.051 0.103 ... $ stringency_index : num 8.33 8.33 8.33 8.33 8.33 ...

I have attemped to use below but failed. Many thanks in advance.

datacovid$date =  as.Date(datacovid$date, "%d/%m/%Y")
datacovid$month =  format(datacovid$date, "%b-%Y")
datacovid <- aggregate(cbind(total_cases_per_million, stringency_index) ~ month + location, datacovid, mean)

P.S. I am a very R beginner.

Upvotes: 0

Views: 442

Answers (1)

Orkun Berk Yuzbasioglu
Orkun Berk Yuzbasioglu

Reputation: 165

You stored the date column as a factor. You could either read this column as a date type or convert it to date format in R.

For the sample data:

location <- c('Afghanistan', 'Colombia', ' Democratic Republic of Congo', 'India', 'Iraq', 'Lebanon', 'Lebanon')
date <- factor(c('24/02/2020', '25/02/2020', '26/02/2020', '27/02/2020', '28/02/2020', '26/02/2020', '27/02/2020'))
total_cases_per_million <- c(0.026, 0.026, 0.026, 0.026, 0.026, 0.026, 0.052)
stringency_index <- c(8.33, 8.33, 8.33, 8.33, 8.33, 8.33, 10.00)

datacovid <- data.frame(location, date, total_cases_per_million, stringency_index)

You can get the monthly averages for total_cases_per_million and stringency_index for each country, first by converting the date column to a date format and then you can use dplyr's group_by function.

datacovid$date = as.Date(datacovid$date, format = "%d/%m/%Y")

library(dplyr)

datacovid %>% 
    mutate(month = format(date, "%m")) %>%
        group_by(location, month) %>% 
            summarise(avg_total_cases_per_million=mean(total_cases_per_million), avg_stringency_index=mean(stringency_index))

This yields the output:

output

or you can use the lubridate package to extract the month from the date which does this neatly:

library(lubridate)

datacovid %>% 
    mutate(month = month(date)) %>%
        group_by(location, month) %>% 
            summarise(avg_total_cases_per_million=mean(total_cases_per_million), avg_stringency_index=mean(stringency_index))

Upvotes: 0

Related Questions