beri
beri

Reputation: 89

IFELSE in R returning incorrect values

I have a data frame with categorical values that were entered manually and there are several mistakes. Someone cleaned up the bad data and I loaded that into R and merged that with the rest of my data. Everything so far is good.

As an example, let's say this is the data I have with the original (mix of good and bad data) in the "Value" column and the corrections of the bad data in the "Value_Clean" column. Obviously this is a small example but my actual data frame has dozens of corrections of different values and several thousand rows.

test <- data.frame(ID = c(1, 2, 3)
               , Value = c("Discuss plan", "Discuss plan", "Discuss paln")
               , Value_Clean = c(NA, NA, "Discuss plan"))

I would like to create a new column called "Value_Final" that has "Discuss plan" for IDs 1, 2, and 3.

It seems pretty straightforward that I should be able to accomplish this with an ifelse:

test$Value_Final <- ifelse(is.na(test$Value_Clean), test$Value, test$Value_Clean)

However, when I do that I get the following:

> test
  ID        Value  Value_Clean Value_Final
1  1 Discuss plan         <NA>           2
2  2 Discuss plan         <NA>           2
3  3 Discuss paln Discuss plan           1

What the hell? I feel like I've done similar things with ifelse in R without running into this issue, so what is going?

Thanks!

Upvotes: 2

Views: 791

Answers (3)

IceCreamToucan
IceCreamToucan

Reputation: 28675

The dplyr version of ifelse doesn't have this issue

ifelse(is.na(test$Value_Clean), test$Value, test$Value_Clean)

# [1] 2 2 1


dplyr::if_else(is.na(test$Value_Clean), test$Value, test$Value_Clean)

# [1] Discuss plan Discuss plan Discuss plan
# Levels: Discuss paln Discuss plan

FYI for this particular example you might use coalesce instead

dplyr::coalesce(test$Value_Clean, test$Value)
# [1] Discuss plan Discuss plan Discuss plan
# Levels: Discuss plan

Upvotes: 3

maRtin
maRtin

Reputation: 6516

you could try dplyr and tibbles as an alternative:

library(dplyr)

tibble(ID = c(1, 2, 3)
       , Value = c("Discuss plan", "Discuss plan", "Discuss plan")
       , Value_Clean = c(NA, NA, "Discuss plan")) %>% 
  mutate(Value_Final = ifelse(is.na(Value_Clean), Value, Value_Clean))

tibbles don't convert character columns to factors per default, which comes in handy in many many cases

Edit: use as_tibble(dataframe) to convert an existing dataframe to a tibble

Upvotes: 1

akrun
akrun

Reputation: 886948

It is a case of factor coercing to integer storage value. Can be corrected with stringsAsFactors = FALSE while creating the data.frame

test <- data.frame(ID = c(1, 2, 3)
                , Value = c("Discuss plan", "Discuss plan", "Discuss paln")
                , Value_Clean = c(NA, NA, "Discuss plan"), stringsAsFactors = FALSE)
ifelse(is.na(test$Value_Clean), test$Value, test$Value_Clean)
#[1] "Discuss plan" "Discuss plan" "Discuss plan"

Or if the data is already created, then can convert to character with as.character

test[1:2] <- lapply(test[1:2], as.character)

Or do this within the ifelse

ifelse(is.na(test$Value_Clean), as.character(test$Value), 
         as.character(test$Value_Clean))

Upvotes: 5

Related Questions