ah bon
ah bon

Reputation: 10051

Update one column (with NAs) based on the conditions of another column using R

Given a small data set as follows:

df <- structure(list(date = c("2021-09", "2021-10", "2021-11", "2021-12", 
"2021-06", "2021-10"), act_direction = c("decrease", "increase", 
NA, NA, "unchanged", "unchanged"), pred_direction = c(NA, "decrease", 
NA, NA, "decrease", "increase"), direction_acc = c("true", "-", 
"-", "true", "false", "false")), class = "data.frame", row.names = c(NA, 
-6L))

df:

     date act_direction pred_direction direction_acc
1 2021-09      decrease           <NA>          true
2 2021-10      increase       decrease             -
3 2021-11          <NA>           <NA>             -
4 2021-12          <NA>           <NA>          true
5 2021-06     unchanged       decrease         false
6 2021-10     unchanged       increase         false

I try to update direction_acc based on act_direction column, more specifically, if act_direction==unchanged, then I want to update to the content of direction_acc to true, ignoring whatever its original values are.

The expected result:

     date act_direction pred_direction direction_acc
1 2021-09      decrease           <NA>          true
2 2021-10      increase       decrease             -
3 2021-11          <NA>           <NA>             -
4 2021-12          <NA>           <NA>          true
5 2021-06     unchanged       decrease          true
6 2021-10     unchanged       increase          true

I used code below which I find no error, but it returns an outcome which is out of expectation since other values in direction_acc were changed to NAs:

df %>% 
  # mutate_all(na_if, '') %>% # I will replace empty cell '' to NA as example data
  mutate(direction_acc = ifelse(act_direction == 'unchanged', 
                                'true', 
                                as.character(direction_acc)))

Result:

     date act_direction pred_direction direction_acc
1 2021-09      decrease           <NA>          true
2 2021-10      increase       decrease             -
3 2021-11          <NA>           <NA>          <NA>
4 2021-12          <NA>           <NA>          <NA>
5 2021-06     unchanged       decrease          true
6 2021-10     unchanged       increase          true

So my question is why when act_direction with NAs, direction_acc also become NAs, and how could we achieve that correctly?

Upvotes: 0

Views: 106

Answers (1)

r2evans
r2evans

Reputation: 160637

base R

df$direction_acc[df$act_direction == "unchanged"] <- "true"
df
#      date act_direction pred_direction direction_acc
# 1 2021-09                                       true
# 2 2021-10                     decrease             -
# 3 2021-11                                          -
# 4 2021-12                                       true
# 5 2021-06     unchanged       decrease          true
# 6 2021-10     unchanged       increase          true

dplyr

library(dplyr)
df %>%
  mutate(
    direction_acc = if_else(act_direction == "unchanged", "true", direction_acc)
  )

If I add the NAs as you have (I'll use across as mutate_if is deprecated), then we can change == to %in% for the desired effect.

df %>%
  mutate(across(where(is.character), ~ na_if(., ""))) %>%
  mutate(
    direction_acc = if_else(act_direction %in% "unchanged", "true", direction_acc)
  )
#      date act_direction pred_direction direction_acc
# 1 2021-09          <NA>           <NA>          true
# 2 2021-10          <NA>       decrease             -
# 3 2021-11          <NA>           <NA>             -
# 4 2021-12          <NA>           <NA>          true
# 5 2021-06     unchanged       decrease          true
# 6 2021-10     unchanged       increase          true

Upvotes: 1

Related Questions