Reputation: 1078
I am working with a covid dataset, and I got to get a counter from the first day that the virus appeared in said country
This is an example of my data
And this is my desired result
I have been trying with this code:
data1<-data1%>%
arrange(country,Date) %>%
group_by(Country) %>%
mutate(Counter= Date-first(Date)+1)
But just gets me a counter from day 1, how can I get that day 1 is from the day that confirmed is 1 for the first time.
Here is the example data:
structure(list(Date = structure(c(1577836800, 1577923200, 1578009600,
1578096000, 1578182400, 1578268800, 1578355200, 1578441600, 1577836800,
1577923200, 1578009600, 1578096000, 1578182400, 1578268800, 1578355200,
1578441600, 1577836800, 1577923200, 1578009600, 1578096000, 1578182400,
1578268800, 1578355200, 1578441600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), country = c("Afganistan", "Afganistan", "Afganistan",
"Afganistan", "Afganistan", "Afganistan", "Afganistan", "Afganistan",
"Colombia", "Colombia", "Colombia", "Colombia", "Colombia", "Colombia",
"Colombia", "Colombia", "France", "France", "France", "France",
"France", "France", "France", "France"), confirmed = c(0, 0,
0, 0, 0, 1, 1, 2, 0, 0, 1, 1, 2, 3, 3, 3, 0, 0, 0, 0, 0, 1, 1,
1)), row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"
))
Upvotes: 0
Views: 64
Reputation: 30474
To get the first Date
within a country
group where the number of confirmed cases if greater than 0, you can try Date[which(confirmed > 0)][1]
. For Date
s after that first confirmed date, you can calculate the counter taking the difference similar to what you had tried.
library(dplyr)
df %>%
arrange(country, Date) %>%
group_by(country) %>%
mutate(first_confirmed = Date[which(confirmed > 0)][1],
counter = ifelse(Date >= first_confirmed, Date - first_confirmed + 1, 0))
Output
Date country confirmed first_confirmed counter
<date> <chr> <dbl> <date> <dbl>
1 2020-01-01 Afganistan 0 2020-01-06 0
2 2020-01-02 Afganistan 0 2020-01-06 0
3 2020-01-03 Afganistan 0 2020-01-06 0
4 2020-01-04 Afganistan 0 2020-01-06 0
5 2020-01-05 Afganistan 0 2020-01-06 0
6 2020-01-06 Afganistan 1 2020-01-06 1
7 2020-01-07 Afganistan 1 2020-01-06 2
8 2020-01-08 Afganistan 2 2020-01-06 3
9 2020-01-01 Colombia 0 2020-01-03 0
10 2020-01-02 Colombia 0 2020-01-03 0
11 2020-01-03 Colombia 1 2020-01-03 1
12 2020-01-04 Colombia 1 2020-01-03 2
13 2020-01-05 Colombia 2 2020-01-03 3
14 2020-01-06 Colombia 3 2020-01-03 4
15 2020-01-07 Colombia 3 2020-01-03 5
16 2020-01-08 Colombia 3 2020-01-03 6
17 2020-01-01 France 0 2020-01-06 0
18 2020-01-02 France 0 2020-01-06 0
19 2020-01-03 France 0 2020-01-06 0
20 2020-01-04 France 0 2020-01-06 0
21 2020-01-05 France 0 2020-01-06 0
22 2020-01-06 France 1 2020-01-06 1
23 2020-01-07 France 1 2020-01-06 2
24 2020-01-08 France 1 2020-01-06 3
Data
df <- structure(list(Date = structure(c(18262, 18263, 18264, 18265,
18266, 18267, 18268, 18269, 18262, 18263, 18264, 18265, 18266,
18267, 18268, 18269, 18262, 18263, 18264, 18265, 18266, 18267,
18268, 18269), class = "Date"), country = c("Afganistan", "Afganistan",
"Afganistan", "Afganistan", "Afganistan", "Afganistan", "Afganistan",
"Afganistan", "Colombia", "Colombia", "Colombia", "Colombia",
"Colombia", "Colombia", "Colombia", "Colombia", "France", "France",
"France", "France", "France", "France", "France", "France"),
confirmed = c(0, 0, 0, 0, 0, 1, 1, 2, 0, 0, 1, 1, 2, 3, 3,
3, 0, 0, 0, 0, 0, 1, 1, 1)), class = "data.frame", row.names = c(NA,
-24L))
Upvotes: 1