Reputation: 1577
I would like to mutate a string differently, depending on the format. This example has 2 formats based on inclusion of certain punctuation. Each element of the vector contains specific words uniquely associated with the format.
I have tried multiple approaches with ifelse and casewhen but not getting the desired results, which is to "keep" the last part of the string.
I am trying to use easy verbs and am not proficient in grex. Open to any suggestions for an efficient general method.
library(dplyr)
library(stringr)
df <- data.frame(KPI = c("xxxxx.x...Alpha...Keep.1",
"xxxxx.x...Alpha..Keep.2",
"Bravo...Keep3",
"Bravo...Keep4",
"xxxxx...Charlie...Keep.5",
"xxxxx...Charlie...Keep.6"))
dot3dot3split <- function(x) strsplit(x, "..." , fixed = TRUE)[[1]][3]
dot3dot3split("xxxxx.x...Alpha...Keep.1") # returns as expected
"Keep.1"
dot3split <- function(x) strsplit(x, "..." , fixed = TRUE)[[1]][2]
dot3split("Bravo...Keep3") # returns as expected
"Keep3"
df1 <- df %>% mutate_if(is.factor, as.character) %>%
mutate(KPI.v2 = ifelse(str_detect(KPI, paste(c("Alpha", "Charlie"), collapse = '|')), dot3dot3split(KPI),
ifelse(str_detect(KPI, "Bravo"), dot3split(KPI), KPI))) # not working as expected
df1$KPI.v2 "Keep.1" "Keep.1" "Alpha" "Alpha" "Keep.1" "Keep.1"
Upvotes: 2
Views: 3438
Reputation: 39154
The functions you designed (dot3dot3split
and dot3split
) are not able to vectorize the operation. For example, if there are more than one elements, only the first one is returned. That may cause some problems.
dot3dot3split(c("xxxxx.x...Alpha...Keep.1", "xxxxx.x...Alpha..Keep.2"))
# [1] "Keep.1"
Since you are using stringr, I suggest that you can use str_extract
to extract the string you want, without using ifelse
or functions that can do vectorized operation.
df <- data.frame(KPI = c("xxxxx.x...Alpha...apples",
"xxxxx.x...Alpha..bananas",
"Bravo...oranges",
"Bravo...grapes",
"xxxxx...Charlie...cherries",
"xxxxx...Charlie...guavas"))
library(dplyr)
library(stringr)
df1 <- df %>%
mutate_if(is.factor, as.character) %>%
mutate(KPI.v2 = str_extract(KPI, "[A-Za-z]*$"))
df1
# KPI KPI.v2
# 1 xxxxx.x...Alpha...apples apples
# 2 xxxxx.x...Alpha..bananas bananas
# 3 Bravo...oranges oranges
# 4 Bravo...grapes grapes
# 5 xxxxx...Charlie...cherries cherries
# 6 xxxxx...Charlie...guavas guavas
Upvotes: 2