Reputation: 9
I have a large dataset as sampled below, and I want to convert daily data for each country to monthly while having the averages of total_cases_per_million and stringency_index.
head(datacovid)
location date total_cases_per_million stringency_index
Afghanistan 24/02/2020 0.026 8.33
Colombia 25/02/2020 0.026 8.33
Democratic Republic of Congo 26/02/2020 0.026 8.33
India 27/02/2020 0.026 8.33
Iraq 28/02/2020 0.026 8.33
Lebanon 29/02/2020 0.026 8.33
The structure is as follows:
str(datacovid) 'data.frame': 78444 obs. of 4 variables: $ location : chr "Afghanistan" "Colombia" "Democratic Republic of Congo" "India" ... $ date : Factor w/ 455 levels "01/01/2020","01/01/2021",..: 348 363 378 393 408 423 5 20 35 50 ... $ total_cases_per_million: num 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.051 0.103 ... $ stringency_index : num 8.33 8.33 8.33 8.33 8.33 ...
I have attemped to use below but failed. Many thanks in advance.
datacovid$date = as.Date(datacovid$date, "%d/%m/%Y")
datacovid$month = format(datacovid$date, "%b-%Y")
datacovid <- aggregate(cbind(total_cases_per_million, stringency_index) ~ month + location, datacovid, mean)
P.S. I am a very R beginner.
Upvotes: 0
Views: 442
Reputation: 165
You stored the date column as a factor. You could either read this column as a date type or convert it to date format in R.
For the sample data:
location <- c('Afghanistan', 'Colombia', ' Democratic Republic of Congo', 'India', 'Iraq', 'Lebanon', 'Lebanon')
date <- factor(c('24/02/2020', '25/02/2020', '26/02/2020', '27/02/2020', '28/02/2020', '26/02/2020', '27/02/2020'))
total_cases_per_million <- c(0.026, 0.026, 0.026, 0.026, 0.026, 0.026, 0.052)
stringency_index <- c(8.33, 8.33, 8.33, 8.33, 8.33, 8.33, 10.00)
datacovid <- data.frame(location, date, total_cases_per_million, stringency_index)
You can get the monthly averages for total_cases_per_million and stringency_index for each country, first by converting the date column to a date
format and then you can use dplyr
's group_by
function.
datacovid$date = as.Date(datacovid$date, format = "%d/%m/%Y")
library(dplyr)
datacovid %>%
mutate(month = format(date, "%m")) %>%
group_by(location, month) %>%
summarise(avg_total_cases_per_million=mean(total_cases_per_million), avg_stringency_index=mean(stringency_index))
This yields the output:
or you can use the lubridate package to extract the month from the date which does this neatly:
library(lubridate)
datacovid %>%
mutate(month = month(date)) %>%
group_by(location, month) %>%
summarise(avg_total_cases_per_million=mean(total_cases_per_million), avg_stringency_index=mean(stringency_index))
Upvotes: 0