B_to_the_ P
B_to_the_ P

Reputation: 109

Better ways of conditionally updating information in columns

I am collecting information at regular intervals over the course of a year. The first collection point 't1' acts as a kind of reference level.

Overall, if a 'value' is returned as 4 or above, the 'colour' column entry should be 'red'. If less than 4 it will read as 'green'.

Now, if during follow-up data collection points, the value for the variable is 2 or more points greater than the value recorded at 't1' then I want to update the colour column entry to 'blue'.

Please see the example of this below...

data <- tibble::tribble(
 ~parent, ~variable, ~value, ~colour,
    "t1",   "happy",     4L,   "red",
    "t2",   "happy",     5L,   "red",
    "t3",   "happy",     3L, "green",
    "t1",     "sad",     1L, "green",
    "t2",     "sad",     3L, "green",
    "t3",     "sad",     3L, "green"
 )

time <- c('t2', 't3')
my_vars <- c('happy', 'sad')

for (i in time) {
 for (x in my_vars){
   if (data$value[data$parent == i & data$variable == x] >= 
       data$value[data$parent == 't1' & data$variable == x] + 2) {
     data$colour[data$parent == i & data$variable == x] <- 'blue'
   } else {
     data$colour[data$parent == i & data$variable == x] <- data$colour[data$parent == i & data$variable == x]
   }
 }
}

Which gives the output of... table

Q: I'm looking for a more elegant way of achieving this, as the data set I am using has longer column names and the code is difficult to read and just runs off my screen. I'd prefer to do this using some dplyr functions, but initial attempts failed and I returned to the more familiar code structure above.

Also, in reality and with real-world data, I will have about 20+ variables. I need to guard against the presence of NAs throwing the code (for example, should the t1 value be NA, it will throw an error) and I'm not sure how to handle this situation as I'm not well versed when it comes to building in checks just yet, so any pointers on that front would be greatly received.

Thanks.

Upvotes: 1

Views: 54

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

For each variable you can compare the values with ifelse.

library(dplyr)  

data %>%
  group_by(variable) %>%
  mutate(colour = ifelse(value - value[match('t1', parent)] >= 2, 'blue', colour)) %>%
  ungroup

#  parent variable value colour
#  <chr>  <chr>    <int> <chr> 
#1 t1     happy        4 red   
#2 t2     happy        5 red   
#3 t3     happy        3 green 
#4 t1     sad          1 green 
#5 t2     sad          3 blue  
#6 t3     sad          3 blue  

Upvotes: 3

Related Questions