new column based on two rows from consecutive days in R

Question

I have a following problem. I computed average temperature per country and also a difference between the actual daily temperature and the average temperature. See code below:

df1 <- data.frame(country = c("01", "01", "01","01", "01", "02", "02" , "03", "03","03"),
                  date = c("2020-01-01", "2020-01-02", "2020-01-03" , "2020-01-05", "2020-01-07", "2020-01-01", "2020-01-03", "2020-01-02",  "2020-01-03", "2020-01-04"),
                  temperature = c(4, 3, -2, 0.1, -3, 1.5, 12, 10, 7, 5),
                  blabla = c(23, 41, 32, 8, 50, 27, 8, 7, 6, 12)
)


library(dplyr)

df2 <- df1 %>%
  group_by(country) %>%
  mutate(mean_per_country = mean(temperature))


df2$difference <- df2$temperature - df2$mean_per_country

Now I need to create a new column that checks if (unlimited number of) consecutive days in the same country have negative, or positive difference between the actual daily temperature and the average temperature. Is there an elegant way how can I do it in R?

Desired output is here:

desired_df <- data.frame(country = c("01", "01", "01","01", "01", "02", "02" , "03", "03","03"),
                         date = c("2020-01-01", "2020-01-02", "2020-01-03" , "2020-01-05", "2020-01-07", "2020-01-01", "2020-01-03", "2020-01-02",  "2020-01-03", "2020-01-04"),
                         temperature = c(4, 3, -2, 2, -3, 1.5, 12, 10, 7, 5),
                         blabla = c(23, 41, 32, 8, 50, 27, 8, 7, 6, 12),
                         mean_per_country = c(0.42, 0.42, 0.42, 0.42, 0.42, 6.75, 6.75, 7.33, 7.33, 7.33),
                         difference = c(3.58, 2.58, -2.42 , -0.32, -3.42 , -5.25, 5.25, 2.67, -0.333, -2.33),
                         new_column = c("hot",
                                        "hot",
                                        "", #day interrupted, therefor not "cold"
                                        "", #day interrupted, therefor not "cold"
                                        "", #day interrupted, therefor not "cold"
                                        "", 
                                        "", 
                                        "", 
                                        "cold",
                                        "cold")
                         
)

Thank you very much

Ian Campbell · Accepted Answer

Here's an approach with dplyr:

library(dplyr)
df2 %>%
   group_by(country) %>%
   mutate(date = as.Date(date),
          consecutive = date - lag(date) == 1,
          result = (sign(difference) == sign(lead(difference)) & lead(consecutive) |
                   (sign(difference) == sign(lag(difference)) & consecutive)),
          new_column = c("cold",NA_character_,"hot")[result * sign(difference) + 2])
# A tibble: 10 x 9
# Groups:   country [3]
   country date       temperature blabla mean_per_country difference consecutive result new_column
                                                    
 1 01      2020-01-01         4       23             0.42      3.58  NA          TRUE   hot       
 2 01      2020-01-02         3       41             0.42      2.58  TRUE        TRUE   hot       
 3 01      2020-01-03        -2       32             0.42     -2.42  TRUE        FALSE  NA        
 4 01      2020-01-05         0.1      8             0.42     -0.32  FALSE       FALSE  NA        
 5 01      2020-01-07        -3       50             0.42     -3.42  FALSE       NA     NA        
 6 02      2020-01-01         1.5     27             6.75     -5.25  NA          NA     NA        
 7 02      2020-01-03        12        8             6.75      5.25  FALSE       NA     NA        
 8 03      2020-01-02        10        7             7.33      2.67  NA          NA     NA        
 9 03      2020-01-03         7        6             7.33     -0.333 TRUE        TRUE   cold      
10 03      2020-01-04         5       12             7.33     -2.33  TRUE        TRUE   cold

To get rid of the intermediate columns that I left there for illustration purposes, just user select(-(consecutive:result)).

new column based on two rows from consecutive days in R

Answers (2)

Related Questions