O. MacDonald
O. MacDonald

Reputation: 17

mutate function with nested ifelse statements creating two columns instead of one

I have some cumulative data on covid-19 cases for countries and i am trying to calculate the difference in a new column called Diff. I can't remove the NA values because it wouldn't show the dates when there were no tests carried out. So i have made it so that if there is an NA value, to set the Diff value to 0 to indicate there was no difference, hence no tests conducted that day.

I am also trying to make a statement which says that if Diff is also NA, indicating that there was no tests conducted the day before, then to set the difference to the confirmed cases value for that day.

As you can see from my results at the bottom, i am almost there but i am creating a new column called ifelse. I tried to fix this but i think there is a simple error i am making somewhere. If anyone could point it out to me i would really appreciate it, thank you.

Edit: I realised i made a logical error with my thinking about setting the daily cases to confirmed cases when the lag calculation = NA because this is giving a misleading answer.

I used the below code on the large dataset to fill down and repeat the previous values when NAs appear. I filtered by group so as not to simply propagate forward values across countries.

I then calculated the lag and then used Ronak Shah's code to get the daily values.

data <- data %>%
            group_by(CountryName) %>%
            fill(ConfirmedCases, .direction = "down")

data <- data %>%
            mutate(lag1 = ConfirmedCases - lag(ConfirmedCases))

data <- data %>% mutate(DailyCases = replace_na(coalesce(lag1, ConfirmedCases), 0))


library(tidyverse)

data <- data.frame(
          stringsAsFactors = FALSE,
                        CountryName = c("Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan"),
                     ConfirmedCases = c(NA,7L,NA,NA,NA,10L,16L,21L,
                                        22L,22L,22L,24L,24L,34L,40L,42L,
                                        75L,75L,91L,106L,114L,141L,166L,
                                        192L,235L,235L,270L,299L,337L,367L,
                                        423L),
                               Diff = c(NA,NA,NA,NA,NA,NA,6L,5L,1L,
                                        0L,0L,2L,0L,10L,6L,2L,33L,0L,16L,
                                        15L,8L,27L,25L,26L,43L,0L,35L,
                                        29L,38L,30L,56L)
                 )

data2 <- data %>%
  mutate(Diff = ifelse(is.na(ConfirmedCases) == TRUE, 0, ConfirmedCases - lag(ConfirmedCases)),
                       ifelse(is.na((ConfirmedCases - lag(ConfirmedCases))) == TRUE, ConfirmedCases, ConfirmedCases - lag(ConfirmedCases)))

head(data2, 10)
#>    CountryName ConfirmedCases Diff ifelse(...)
#> 1  Afghanistan             NA    0          NA
#> 2  Afghanistan              7   NA           7
#> 3  Afghanistan             NA    0          NA
#> 4  Afghanistan             NA    0          NA
#> 5  Afghanistan             NA    0          NA
#> 6  Afghanistan             10   NA          10
#> 7  Afghanistan             16    6           6
#> 8  Afghanistan             21    5           5
#> 9  Afghanistan             22    1           1
#> 10 Afghanistan             22    0           0

Created on 2020-08-15 by the reprex package (v0.3.0)

Upvotes: 0

Views: 105

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388972

I think you can use coalesce to get first non-NA value from Diff and ConfirmedCases and if both of them are NA replace it with 0.

library(dplyr)
data %>%
  mutate(Diff2 = tidyr::replace_na(coalesce(Diff,  ConfirmedCases), 0))

#   CountryName ConfirmedCases Diff Diff2
#1  Afghanistan             NA   NA     0
#2  Afghanistan              7   NA     7
#3  Afghanistan             NA   NA     0
#4  Afghanistan             NA   NA     0
#5  Afghanistan             NA   NA     0
#6  Afghanistan             10   NA    10
#7  Afghanistan             16    6     6
#8  Afghanistan             21    5     5
#9  Afghanistan             22    1     1
#10 Afghanistan             22    0     0
#11 Afghanistan             22    0     0
#12 Afghanistan             24    2     2
#...
#...

Upvotes: 1

Duck
Duck

Reputation: 39595

Maybe this can help by creating a duplicate of your target column:

library(tidyverse)

data %>% mutate(D=ConfirmedCases,D=ifelse(is.na(D),0,D),
                Diff2 = c(0,diff(D)),Diff2=ifelse(Diff2<0,0,Diff2)) %>% select(-D)

Output:

   CountryName ConfirmedCases Diff Diff2
1  Afghanistan             NA   NA     0
2  Afghanistan              7   NA     7
3  Afghanistan             NA   NA     0
4  Afghanistan             NA   NA     0
5  Afghanistan             NA   NA     0
6  Afghanistan             10   NA    10
7  Afghanistan             16    6     6
8  Afghanistan             21    5     5
9  Afghanistan             22    1     1
10 Afghanistan             22    0     0
11 Afghanistan             22    0     0
12 Afghanistan             24    2     2
13 Afghanistan             24    0     0
14 Afghanistan             34   10    10
15 Afghanistan             40    6     6
16 Afghanistan             42    2     2
17 Afghanistan             75   33    33
18 Afghanistan             75    0     0
19 Afghanistan             91   16    16
20 Afghanistan            106   15    15
21 Afghanistan            114    8     8
22 Afghanistan            141   27    27
23 Afghanistan            166   25    25
24 Afghanistan            192   26    26
25 Afghanistan            235   43    43
26 Afghanistan            235    0     0
27 Afghanistan            270   35    35
28 Afghanistan            299   29    29
29 Afghanistan            337   38    38
30 Afghanistan            367   30    30
31 Afghanistan            423   56    56

Upvotes: 1

Related Questions