ViSa
ViSa

Reputation: 2225

lag() function in R not working as expected to calculate daily values

I am using covid data set and tried creating lag values which I will further use to calculate daily cases but lag is not working as expected and not sure where I am going wrong.

df

df_confirmed_gathered %>% 
  mutate(Cases_Dates = ymd(Cases_Dates)) %>% 
  group_by(Country.Region, Cases_Dates) %>% 
  filter(Country.Region == "Italy")
Country.Region    Lat    Long   Cases_Dates  Cases_Counts
<chr>            <dbl>   <dbl>  <date>       <int>

Italy   41.87194    12.56738    2020-02-01  2
Italy   41.87194    12.56738    2020-02-02  2
Italy   41.87194    12.56738    2020-02-03  2
Italy   41.87194    12.56738    2020-02-04  2
Italy   41.87194    12.56738    2020-02-05  2
Italy   41.87194    12.56738    2020-02-06  2
Italy   41.87194    12.56738    2020-02-07  3
Italy   41.87194    12.56738    2020-02-08  3
Italy   41.87194    12.56738    2020-02-09  3
Italy   41.87194    12.56738    2020-02-10  3

Calculating lag

df_confirmed_gathered %>% 
  mutate(Cases_Dates = ymd(Cases_Dates)) %>% 
  group_by(Country.Region, Cases_Dates) %>% 
  mutate(lag_Cases = lag(Cases_Counts, default = 0)) %>%
  filter(Country.Region == "Italy") 
Country.Region    Lat    Long   Cases_Dates  Cases_Counts  lag_Cases
<chr>            <dbl>   <dbl>  <date>       <int>         <dbl>

Italy   41.87194    12.56738    2020-02-01  2   0
Italy   41.87194    12.56738    2020-02-02  2   0
Italy   41.87194    12.56738    2020-02-03  2   0
Italy   41.87194    12.56738    2020-02-04  2   0
Italy   41.87194    12.56738    2020-02-05  2   0
Italy   41.87194    12.56738    2020-02-06  2   0
Italy   41.87194    12.56738    2020-02-07  3   0
Italy   41.87194    12.56738    2020-02-08  3   0
Italy   41.87194    12.56738    2020-02-09  3   0
Italy   41.87194    12.56738    2020-02-10  3   0
 

Calculating Daily Cases using lag function

df_confirmed_gathered %>% 
  mutate(Cases_Dates = ymd(Cases_Dates)) %>%
  group_by(Country.Region, Cases_Dates) %>% 
  mutate(Daily_Cases = Cases_Counts - lag(Cases_Counts, default = 0)) %>% 
  ungroup() %>% 
  filter(Country.Region == "Italy")
Country.Region    Lat    Long   Cases_Dates  Cases_Counts  lag_Cases
<chr>            <dbl>   <dbl>  <date>       <int>         <dbl>

Italy   41.87194    12.56738    2020-02-01  2   2
Italy   41.87194    12.56738    2020-02-02  2   2
Italy   41.87194    12.56738    2020-02-03  2   2
Italy   41.87194    12.56738    2020-02-04  2   2
Italy   41.87194    12.56738    2020-02-05  2   2
Italy   41.87194    12.56738    2020-02-06  2   2
Italy   41.87194    12.56738    2020-02-07  3   3
Italy   41.87194    12.56738    2020-02-08  3   3
Italy   41.87194    12.56738    2020-02-09  3   3
Italy   41.87194    12.56738    2020-02-10  3   3

Upvotes: 0

Views: 680

Answers (1)

pseudospin
pseudospin

Reputation: 2767

Drop Cases_Dates from the group_by and the lag function should work properly. If you have multiple Lat and Long values, then obviously you'll want to add those into the grouping.

Upvotes: 1

Related Questions