Reputation: 25
df <- read.csv('https://raw.githubusercontent.com/ulklc/covid19-
timeseries/master/countryReport/raw/rawReport.csv')
df$countryName = as.character(df$countryName)
I processed the dataset.
How can I find the countries that report the most patients, death and recovery in one day?
Examples:
1 June 2020 the country that reported the most deaths, 1 june the country that reported the case and 1 June the country that reported the most recoved
Upvotes: 0
Views: 138
Reputation: 736
The below code uses the dplyr
R package to create a data frame called records
that contains the data you desire. Make sure you have dplyr
installed by running install.package("dplyr")
in R or RStudio.
## call the dplyr library
library(dplyr)
## read in your data to R
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv', stringsAsFactors = FALSE)
## set the date you wish to query max records for
set.date <- "2020-06-01"
## copy the data to preserve the original
df1 <- df
## filter the records to only those that match the specified date
df1 <- filter(df1, as.Date(date, "%Y/%m/%d") == as.Date(set.date))
## determine which country had the most confirmed on the specified day
max.confirmed <- df1[which.max(df1$confirmed),]
## format the record to identify it as the record with most confirmed
max.confirmed$confirmed <- paste0("**",max.confirmed$confirmed,"**")
## determine which country had the most deaths on the specified day
max.deaths <- df1[which.max(df1$death),]
## format the record to identify it as the record with most deaths
max.deaths$death <- paste0("**",max.deaths$death,"**")
## determine which country had the most recovered on the specified day
max.recovered <- df1[which.max(df1$recovered),]
## format the record to identify it as the record with most recovered
max.recovered$recovered <- paste0("**",max.recovered$recovered,"**")
## create the reocrds data frame to contain your max records
records <- rbind(max.confirmed, max.deaths, max.recovered)
You can update the date you wish to select by by changing "2020-06-01"
to the date you desire to query max death and recovered for. Make sure to use the "YYYY-MM-DD"
format.
Alternatively, instead of manually updating the code, you can use the readline()
function to ask the user to input the date they would like to query max data for.
ADDED (based on comments) If you want to use today's data (or if today's data is not available the most recent data) you can use the code below:
## call the dplyr library
library(dplyr)
## read the data into R
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv', stringsAsFactors = FALSE)
## determine the max date contained within the data
max.date <- df[which.max(as.Date(df$day)),"day"]
## copy the data to preserve original
df1 <- df
## filter the data to only entries from the max day
df1 <- filter(df1, as.Date(date, "%Y/%m/%d") == as.Date(max.date))
## determine the entry with the most deaths
max.deaths <- df1[which.max(df1$death),]
## format the number of deaths as given in the example
max.deaths$death <- paste0("**",max.deaths$death,"**")
## determine the entry with the most recovered
max.recovered <- df1[which.max(df1$recovered),]
## format the number recovered to match the format of the example
max.recovered$recovered <- paste0("**",max.recovered$recovered,"**")
## create a data frame containing our max death and max recovered entries
max.records <- rbind(max.deaths, max.recovered)
## attach a column with the max date which corresponds to the date of the entries selected
max.records$date <- max.date
## organize the data as shown in the example
max.records <- select(max.records, c("day","countryName","death","recovered"))
I hope this helps!
Upvotes: 1