Henri
Henri

Reputation: 1235

Find the first value that meet a defined criteria

I have a dataset on Coivd-19 cases and deaths by day and country. I wish to find the date when the first death occured for every country, and the filter away all the preceding days. How would you tackle this problem in R/Tidyverse?

library(readxl)
library(httr)
url <- paste("https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide-",format(Sys.time(), "%Y-%m-%d"), ".xlsx", sep = "")
GET(url, authenticate(":", ":", type="ntlm"), write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(tf)

Upvotes: 1

Views: 92

Answers (2)

Robbie Still
Robbie Still

Reputation: 11

library(dplyr)

df <- df %>% 
      arrange(DateRep) %>% 
      distinct(`Countries and territories`, .keep_all = TRUE)

I think this is the simplest option for you. dplyr::distinct() removes all but the first row, so if you arrange the data by the desired column you'll get the first match only. .keep_all = TRUE will keep all columns, not just those specified in distinct().

Upvotes: 0

nurandi
nurandi

Reputation: 1618

Try this:

library(dplyr) 
# or library(tidyverse)

df %>%
  arrange(`Countries and territories`, DateRep) %>%
  group_by(`Countries and territories`) %>%
  mutate(Cumulative_Death = cumsum(Deaths)) %>%
  ungroup() %>%
  filter(Cumulative_Death > 0) %>%
  group_by(`Countries and territories`) %>%
  mutate(First_Death_Date = min(DateRep)) 

It adds new columns Cumulative_Death : sum of death up-to DateRep and First_Death_Date: date when the first death occured for every country

Upvotes: 1

Related Questions