Reputation: 89
I have a data frame with categorical values that were entered manually and there are several mistakes. Someone cleaned up the bad data and I loaded that into R and merged that with the rest of my data. Everything so far is good.
As an example, let's say this is the data I have with the original (mix of good and bad data) in the "Value" column and the corrections of the bad data in the "Value_Clean" column. Obviously this is a small example but my actual data frame has dozens of corrections of different values and several thousand rows.
test <- data.frame(ID = c(1, 2, 3)
, Value = c("Discuss plan", "Discuss plan", "Discuss paln")
, Value_Clean = c(NA, NA, "Discuss plan"))
I would like to create a new column called "Value_Final" that has "Discuss plan" for IDs 1, 2, and 3.
It seems pretty straightforward that I should be able to accomplish this with an ifelse:
test$Value_Final <- ifelse(is.na(test$Value_Clean), test$Value, test$Value_Clean)
However, when I do that I get the following:
> test
ID Value Value_Clean Value_Final
1 1 Discuss plan <NA> 2
2 2 Discuss plan <NA> 2
3 3 Discuss paln Discuss plan 1
What the hell? I feel like I've done similar things with ifelse in R without running into this issue, so what is going?
Thanks!
Upvotes: 2
Views: 791
Reputation: 28675
The dplyr version of ifelse doesn't have this issue
ifelse(is.na(test$Value_Clean), test$Value, test$Value_Clean)
# [1] 2 2 1
dplyr::if_else(is.na(test$Value_Clean), test$Value, test$Value_Clean)
# [1] Discuss plan Discuss plan Discuss plan
# Levels: Discuss paln Discuss plan
FYI for this particular example you might use coalesce instead
dplyr::coalesce(test$Value_Clean, test$Value)
# [1] Discuss plan Discuss plan Discuss plan
# Levels: Discuss plan
Upvotes: 3
Reputation: 6516
you could try dplyr and tibbles as an alternative:
library(dplyr)
tibble(ID = c(1, 2, 3)
, Value = c("Discuss plan", "Discuss plan", "Discuss plan")
, Value_Clean = c(NA, NA, "Discuss plan")) %>%
mutate(Value_Final = ifelse(is.na(Value_Clean), Value, Value_Clean))
tibbles don't convert character columns to factors per default, which comes in handy in many many cases
Edit:
use as_tibble(dataframe)
to convert an existing dataframe to a tibble
Upvotes: 1
Reputation: 886948
It is a case of factor
coercing to integer storage value. Can be corrected with stringsAsFactors = FALSE
while creating the data.frame
test <- data.frame(ID = c(1, 2, 3)
, Value = c("Discuss plan", "Discuss plan", "Discuss paln")
, Value_Clean = c(NA, NA, "Discuss plan"), stringsAsFactors = FALSE)
ifelse(is.na(test$Value_Clean), test$Value, test$Value_Clean)
#[1] "Discuss plan" "Discuss plan" "Discuss plan"
Or if the data is already created, then can convert to character
with as.character
test[1:2] <- lapply(test[1:2], as.character)
Or do this within the ifelse
ifelse(is.na(test$Value_Clean), as.character(test$Value),
as.character(test$Value_Clean))
Upvotes: 5