Reputation: 17
I have some cumulative data on covid-19 cases for countries and i am trying to calculate the difference in a new column called Diff. I can't remove the NA values because it wouldn't show the dates when there were no tests carried out. So i have made it so that if there is an NA value, to set the Diff value to 0 to indicate there was no difference, hence no tests conducted that day.
I am also trying to make a statement which says that if Diff is also NA, indicating that there was no tests conducted the day before, then to set the difference to the confirmed cases value for that day.
As you can see from my results at the bottom, i am almost there but i am creating a new column called ifelse. I tried to fix this but i think there is a simple error i am making somewhere. If anyone could point it out to me i would really appreciate it, thank you.
Edit: I realised i made a logical error with my thinking about setting the daily cases to confirmed cases when the lag calculation = NA because this is giving a misleading answer.
I used the below code on the large dataset to fill down and repeat the previous values when NAs appear. I filtered by group so as not to simply propagate forward values across countries.
I then calculated the lag and then used Ronak Shah's code to get the daily values.
data <- data %>%
group_by(CountryName) %>%
fill(ConfirmedCases, .direction = "down")
data <- data %>%
mutate(lag1 = ConfirmedCases - lag(ConfirmedCases))
data <- data %>% mutate(DailyCases = replace_na(coalesce(lag1, ConfirmedCases), 0))
library(tidyverse)
data <- data.frame(
stringsAsFactors = FALSE,
CountryName = c("Afghanistan","Afghanistan",
"Afghanistan","Afghanistan","Afghanistan",
"Afghanistan","Afghanistan",
"Afghanistan","Afghanistan","Afghanistan",
"Afghanistan","Afghanistan","Afghanistan",
"Afghanistan","Afghanistan",
"Afghanistan","Afghanistan","Afghanistan",
"Afghanistan","Afghanistan","Afghanistan",
"Afghanistan","Afghanistan",
"Afghanistan","Afghanistan","Afghanistan",
"Afghanistan","Afghanistan","Afghanistan",
"Afghanistan","Afghanistan"),
ConfirmedCases = c(NA,7L,NA,NA,NA,10L,16L,21L,
22L,22L,22L,24L,24L,34L,40L,42L,
75L,75L,91L,106L,114L,141L,166L,
192L,235L,235L,270L,299L,337L,367L,
423L),
Diff = c(NA,NA,NA,NA,NA,NA,6L,5L,1L,
0L,0L,2L,0L,10L,6L,2L,33L,0L,16L,
15L,8L,27L,25L,26L,43L,0L,35L,
29L,38L,30L,56L)
)
data2 <- data %>%
mutate(Diff = ifelse(is.na(ConfirmedCases) == TRUE, 0, ConfirmedCases - lag(ConfirmedCases)),
ifelse(is.na((ConfirmedCases - lag(ConfirmedCases))) == TRUE, ConfirmedCases, ConfirmedCases - lag(ConfirmedCases)))
head(data2, 10)
#> CountryName ConfirmedCases Diff ifelse(...)
#> 1 Afghanistan NA 0 NA
#> 2 Afghanistan 7 NA 7
#> 3 Afghanistan NA 0 NA
#> 4 Afghanistan NA 0 NA
#> 5 Afghanistan NA 0 NA
#> 6 Afghanistan 10 NA 10
#> 7 Afghanistan 16 6 6
#> 8 Afghanistan 21 5 5
#> 9 Afghanistan 22 1 1
#> 10 Afghanistan 22 0 0
Created on 2020-08-15 by the reprex package (v0.3.0)
Upvotes: 0
Views: 105
Reputation: 388972
I think you can use coalesce
to get first non-NA value from Diff
and ConfirmedCases
and if both of them are NA
replace it with 0.
library(dplyr)
data %>%
mutate(Diff2 = tidyr::replace_na(coalesce(Diff, ConfirmedCases), 0))
# CountryName ConfirmedCases Diff Diff2
#1 Afghanistan NA NA 0
#2 Afghanistan 7 NA 7
#3 Afghanistan NA NA 0
#4 Afghanistan NA NA 0
#5 Afghanistan NA NA 0
#6 Afghanistan 10 NA 10
#7 Afghanistan 16 6 6
#8 Afghanistan 21 5 5
#9 Afghanistan 22 1 1
#10 Afghanistan 22 0 0
#11 Afghanistan 22 0 0
#12 Afghanistan 24 2 2
#...
#...
Upvotes: 1
Reputation: 39595
Maybe this can help by creating a duplicate of your target column:
library(tidyverse)
data %>% mutate(D=ConfirmedCases,D=ifelse(is.na(D),0,D),
Diff2 = c(0,diff(D)),Diff2=ifelse(Diff2<0,0,Diff2)) %>% select(-D)
Output:
CountryName ConfirmedCases Diff Diff2
1 Afghanistan NA NA 0
2 Afghanistan 7 NA 7
3 Afghanistan NA NA 0
4 Afghanistan NA NA 0
5 Afghanistan NA NA 0
6 Afghanistan 10 NA 10
7 Afghanistan 16 6 6
8 Afghanistan 21 5 5
9 Afghanistan 22 1 1
10 Afghanistan 22 0 0
11 Afghanistan 22 0 0
12 Afghanistan 24 2 2
13 Afghanistan 24 0 0
14 Afghanistan 34 10 10
15 Afghanistan 40 6 6
16 Afghanistan 42 2 2
17 Afghanistan 75 33 33
18 Afghanistan 75 0 0
19 Afghanistan 91 16 16
20 Afghanistan 106 15 15
21 Afghanistan 114 8 8
22 Afghanistan 141 27 27
23 Afghanistan 166 25 25
24 Afghanistan 192 26 26
25 Afghanistan 235 43 43
26 Afghanistan 235 0 0
27 Afghanistan 270 35 35
28 Afghanistan 299 29 29
29 Afghanistan 337 38 38
30 Afghanistan 367 30 30
31 Afghanistan 423 56 56
Upvotes: 1