Michael Bellhouse
Michael Bellhouse

Reputation: 1577

dplyr mutate stringr str_detect with multiple conditional arguments and corresponding output

I would like to mutate a string differently, depending on the format. This example has 2 formats based on inclusion of certain punctuation. Each element of the vector contains specific words uniquely associated with the format.

I have tried multiple approaches with ifelse and casewhen but not getting the desired results, which is to "keep" the last part of the string.

I am trying to use easy verbs and am not proficient in grex. Open to any suggestions for an efficient general method.

library(dplyr)
library(stringr)
df <- data.frame(KPI = c("xxxxx.x...Alpha...Keep.1",
                     "xxxxx.x...Alpha..Keep.2",
                     "Bravo...Keep3",
                     "Bravo...Keep4",
                     "xxxxx...Charlie...Keep.5",
                     "xxxxx...Charlie...Keep.6"))

dot3dot3split <- function(x) strsplit(x,  "..." , fixed = TRUE)[[1]][3]
dot3dot3split("xxxxx.x...Alpha...Keep.1") # returns as expected
"Keep.1"

dot3split <- function(x) strsplit(x,  "..." , fixed = TRUE)[[1]][2]
dot3split("Bravo...Keep3") # returns as expected
"Keep3"

df1 <- df %>% mutate_if(is.factor, as.character) %>%
        mutate(KPI.v2 = ifelse(str_detect(KPI, paste(c("Alpha", "Charlie"), collapse = '|')), dot3dot3split(KPI), 
                               ifelse(str_detect(KPI, "Bravo"), dot3split(KPI), KPI))) # not working as expected

df1$KPI.v2 "Keep.1" "Keep.1" "Alpha" "Alpha" "Keep.1" "Keep.1"

Upvotes: 2

Views: 3438

Answers (1)

www
www

Reputation: 39154

The functions you designed (dot3dot3split and dot3split) are not able to vectorize the operation. For example, if there are more than one elements, only the first one is returned. That may cause some problems.

dot3dot3split(c("xxxxx.x...Alpha...Keep.1", "xxxxx.x...Alpha..Keep.2"))
# [1] "Keep.1" 

Since you are using , I suggest that you can use str_extract to extract the string you want, without using ifelse or functions that can do vectorized operation.

df <- data.frame(KPI = c("xxxxx.x...Alpha...apples",
                         "xxxxx.x...Alpha..bananas",
                         "Bravo...oranges",
                         "Bravo...grapes",
                         "xxxxx...Charlie...cherries",
                         "xxxxx...Charlie...guavas"))

library(dplyr)
library(stringr)

df1 <- df %>%
  mutate_if(is.factor, as.character) %>%
  mutate(KPI.v2 = str_extract(KPI, "[A-Za-z]*$"))
df1
#                          KPI   KPI.v2
# 1   xxxxx.x...Alpha...apples   apples
# 2   xxxxx.x...Alpha..bananas  bananas
# 3            Bravo...oranges  oranges
# 4             Bravo...grapes   grapes
# 5 xxxxx...Charlie...cherries cherries
# 6   xxxxx...Charlie...guavas   guavas

Upvotes: 2

Related Questions