Jorge Paredes
Jorge Paredes

Reputation: 1078

Conditional counters in R

I am working with a covid dataset, and I got to get a counter from the first day that the virus appeared in said country

This is an example of my data

enter image description here

And this is my desired result

enter image description here

I have been trying with this code:

data1<-data1%>% 
  arrange(country,Date) %>% 
  group_by(Country) %>% 
  mutate(Counter= Date-first(Date)+1)

But just gets me a counter from day 1, how can I get that day 1 is from the day that confirmed is 1 for the first time.

Here is the example data:

structure(list(Date = structure(c(1577836800, 1577923200, 1578009600, 
1578096000, 1578182400, 1578268800, 1578355200, 1578441600, 1577836800, 
1577923200, 1578009600, 1578096000, 1578182400, 1578268800, 1578355200, 
1578441600, 1577836800, 1577923200, 1578009600, 1578096000, 1578182400, 
1578268800, 1578355200, 1578441600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), country = c("Afganistan", "Afganistan", "Afganistan", 
"Afganistan", "Afganistan", "Afganistan", "Afganistan", "Afganistan", 
"Colombia", "Colombia", "Colombia", "Colombia", "Colombia", "Colombia", 
"Colombia", "Colombia", "France", "France", "France", "France", 
"France", "France", "France", "France"), confirmed = c(0, 0, 
0, 0, 0, 1, 1, 2, 0, 0, 1, 1, 2, 3, 3, 3, 0, 0, 0, 0, 0, 1, 1, 
1)), row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"
))

Upvotes: 0

Views: 64

Answers (1)

Ben
Ben

Reputation: 30474

To get the first Date within a country group where the number of confirmed cases if greater than 0, you can try Date[which(confirmed > 0)][1]. For Dates after that first confirmed date, you can calculate the counter taking the difference similar to what you had tried.

library(dplyr)

df %>%
  arrange(country, Date) %>%
  group_by(country) %>%
  mutate(first_confirmed = Date[which(confirmed > 0)][1],
         counter = ifelse(Date >= first_confirmed, Date - first_confirmed + 1, 0)) 

Output

   Date       country    confirmed first_confirmed counter
   <date>     <chr>          <dbl> <date>            <dbl>
 1 2020-01-01 Afganistan         0 2020-01-06            0
 2 2020-01-02 Afganistan         0 2020-01-06            0
 3 2020-01-03 Afganistan         0 2020-01-06            0
 4 2020-01-04 Afganistan         0 2020-01-06            0
 5 2020-01-05 Afganistan         0 2020-01-06            0
 6 2020-01-06 Afganistan         1 2020-01-06            1
 7 2020-01-07 Afganistan         1 2020-01-06            2
 8 2020-01-08 Afganistan         2 2020-01-06            3
 9 2020-01-01 Colombia           0 2020-01-03            0
10 2020-01-02 Colombia           0 2020-01-03            0
11 2020-01-03 Colombia           1 2020-01-03            1
12 2020-01-04 Colombia           1 2020-01-03            2
13 2020-01-05 Colombia           2 2020-01-03            3
14 2020-01-06 Colombia           3 2020-01-03            4
15 2020-01-07 Colombia           3 2020-01-03            5
16 2020-01-08 Colombia           3 2020-01-03            6
17 2020-01-01 France             0 2020-01-06            0
18 2020-01-02 France             0 2020-01-06            0
19 2020-01-03 France             0 2020-01-06            0
20 2020-01-04 France             0 2020-01-06            0
21 2020-01-05 France             0 2020-01-06            0
22 2020-01-06 France             1 2020-01-06            1
23 2020-01-07 France             1 2020-01-06            2
24 2020-01-08 France             1 2020-01-06            3

Data

df <- structure(list(Date = structure(c(18262, 18263, 18264, 18265, 
18266, 18267, 18268, 18269, 18262, 18263, 18264, 18265, 18266, 
18267, 18268, 18269, 18262, 18263, 18264, 18265, 18266, 18267, 
18268, 18269), class = "Date"), country = c("Afganistan", "Afganistan", 
"Afganistan", "Afganistan", "Afganistan", "Afganistan", "Afganistan", 
"Afganistan", "Colombia", "Colombia", "Colombia", "Colombia", 
"Colombia", "Colombia", "Colombia", "Colombia", "France", "France", 
"France", "France", "France", "France", "France", "France"), 
    confirmed = c(0, 0, 0, 0, 0, 1, 1, 2, 0, 0, 1, 1, 2, 3, 3, 
    3, 0, 0, 0, 0, 0, 1, 1, 1)), class = "data.frame", row.names = c(NA, 
-24L))

Upvotes: 1

Related Questions