Jean V. Adams
Jean V. Adams

Reputation: 4784

Recode character vector with some empty strings

I have been using the dplyr::recode() function to recode some variables. I have one character variable with some empty strings that I would also like to recode. But if I refer to the empty string in the arguments to the function, I get an error.

# input
x <- c("a", "b", "", "x", "y", "z")
# desired output
c("Apple", "Banana", "Missing", "x", "y", "z")

dplyr::recode(x, "a"="Apple", "b"="Banana", ""="Missing")

Error: attempt to use zero-length variable name

If I treat the empty string as a missing value, the function leaves it as an empty string.

dplyr::recode(x, "a"="Apple", "b"="Banana", .missing="Missing")

[1] "Apple"  "Banana" ""       "x"      "y"      "z"     

How can I recode the values to get the desired output?

Upvotes: 2

Views: 1840

Answers (3)

Pierre Lapointe
Pierre Lapointe

Reputation: 16277

You can use na_if to get .missing working properly:

x <- c("a", "b", "", "x", "y", "z")
dplyr::recode(na_if(x,""), "a"="Apple", "b"="Banana", .missing="Missing")

[1] "Apple"   "Banana"  "Missing" "x"       "y"       "z" 

Upvotes: 7

jess
jess

Reputation: 534

In these cases, I use ifelse. Your example would be: x <- ifelse(x == "", "Missing", x).

In a data.frame context, you can use it inside mutate:

df_x <- data.frame(col1 = c("a", "b", "", "x", "y", "z"))
df_new <- df_x %>% 
          mutate(col1 = ifelse(col1 == "", "Missing", col1))

Upvotes: 0

lmo
lmo

Reputation: 38510

Why not use base R's factor?

myFac <- factor(x, levels=x, labels=c("Apple", "Banana", "Missing", "x", "y", "z"))
myFac
[1] Apple   Banana  Missing x       y       z      
Levels: Apple Banana Missing x y z

If desired, you can convert this to a character vector:

as.character(myFac)
[1] "Apple"   "Banana"  "Missing" "x"       "y"       "z"

Upvotes: 2

Related Questions