Reputation: 9
I have a large dataset as sampled below, and I want to convert daily data to monthly while having the averages of total_cases_per_million and stringency_index.
head(datacovid)
location date total_cases_per_million stringency_index
Afghanistan 24/02/2020 0.026 8.33
Colombia 25/02/2020 0.026 8.33
Democratic Republic of Congo 26/02/2020 0.026 8.33
India 27/02/2020 0.026 8.33
Iraq 28/02/2020 0.026 8.33
Lebanon 29/02/2020 0.026 8.33
The structure is as follows:
str(datacovid) 'data.frame': 78444 obs. of 4 variables: $ location : chr "Afghanistan" "Colombia" "Democratic Republic of Congo" "India" ... $ date : Factor w/ 455 levels "01/01/2020","01/01/2021",..: 348 363 378 393 408 423 5 20 35 50 ... $ total_cases_per_million: num 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.051 0.103 ... $ stringency_index : num 8.33 8.33 8.33 8.33 8.33 ...
I have attemped to use "mutate" but failed. Many thanks in advance. P.S. I am a very R beginner.
Upvotes: 0
Views: 727
Reputation: 5620
You can easily do that using lubridate
to transform the date column into date format and tidyverse
to group and summarise. Here's the commented code:
library(tidyverse)
library(lubridate)
df %>%
# Transform date column into date format
mutate_at(vars(date), function(x) dmy(as.character(x))) %>%
# Extract month from date
mutate(var_month = month(date)) %>%
# Group by month
group_by(var_month) %>%
# Calculate the average for cases and stringency, grouped by month
summarise(avg_monthly_cases = mean(total_cases_per_million),
avg_stringency = mean(stringency_index))
# A tibble: 1 x 3
# var_month avg_monthly_cases avg_stringency
# <dbl> <dbl> <dbl>
#1 2 0.026 8.33
Upvotes: 0
Reputation: 1081
The main thing is to group the data by month
. Converting the date variable into date type would help.
df = read.table(text = r'{location date total_cases_per_million stringency_index
Afghanistan 24/02/2020 0.026 8.33
Colombia 25/02/2020 0.026 8.33
'Democratic Republic of Congo' 26/02/2020 0.026 8.33
India 27/02/2020 0.026 8.33
Iraq 28/02/2020 0.026 8.33
Lebanon 29/02/2020 0.026 8.33}', header=T)
df$date = as.Date(df$date, "%d/%m/%Y")
df$month = format(df$date, "%b")
aggregate(cbind(total_cases_per_million, stringency_index) ~ month, df, mean)
# month total_cases_per_million stringency_index
# 1 Feb 0.026 8.33
Upvotes: 0