Reputation: 1393
I have a dataframe in r with a column which is a big string. I want to use that string to create a new column with specific values.
This is the sample dataframe:
dom <- data.frame(
Site = c("alpha", "beta", "charlie", "delta"),
Banner = c("testing_Watermelon -DPI_300x250 v2" , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)
Now if the column Banner has string containing Watermelon
or Vanilla
then the new column label
should have values only Watermelon
or Vanilla
else Default
. Below is what the expected dataframe should be like.
How can I use grep
or anything else to have multiple conditions in that?
dom_output <- data.frame(
Site = c("alpha", "beta", "charlie", "delta"),
Banner = c("testing_Watermelon -bbb_300x250 v2" , "notest_Orange aaa_300x250 v2" , "bottle :15s","aaaa vvvv cccc 320x480"),
label = c("Watermelon","Vanilla","Default","Default")
)
Upvotes: 3
Views: 5137
Reputation: 28826
library(dplyr)
library(stringi)
dom %>% mutate(label = case_when(stri_detect_fixed(Banner, "Watermelon") ~ "Watermelon",
stri_detect_fixed(Banner, "Vanilla") ~ "Vanilla",
TRUE ~ "Default"))
#> Site Banner label
#> 1 alpha testing_Watermelon -DPI_300x250 v2 Watermelon
#> 2 beta notest_Vanilla Latte-DPI_300x250 v2 Vanilla
#> 3 charlie bottle :15s Default
#> 4 delta aaaa vvvv cccc Build_Mobile_320x480 Default
Data:
dom <- data.frame(Site = c("alpha", "beta", "charlie", "delta"),
Banner = c("testing_Watermelon -DPI_300x250 v2",
"notest_Vanilla Latte-DPI_300x250 v2",
"bottle :15s",
"aaaa vvvv cccc Build_Mobile_320x480"))
Upvotes: 0
Reputation: 39858
One base R
possibility could be:
labels <- paste(c("Watermelon", "Orange"), collapse = "|")
dom$label <- sapply(regmatches(dom$Banner, regexec(labels, dom$Banner)), "[", 1)
dom$label[is.na(dom$label)] <- "Default"
Site Banner label
1 alpha testing_Watermelon -DPI_300x250 v2 Watermelon
2 beta notest_Orange Latte-DPI_300x250 v2 Orange
3 charlie bottle :15s Default
4 delta aaaa vvvv cccc Build_Mobile_320x480 Default
The same could be used also by dplyr
and tidyr
:
dom %>%
mutate(label = sapply(regmatches(Banner, regexec(labels, Banner)), "[", 1),
label = replace_na(label, "Default"))
Sample data:
dom <- data.frame(
Site = c("alpha", "beta", "charlie", "delta"),
Banner = c("testing_Watermelon -DPI_300x250 v2" , "notest_Orange Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)
Upvotes: 0
Reputation: 2945
Here's a simple solution using Base R:
#Sample data:
dom <- data.frame(
Site = c("alpha", "beta", "charlie", "delta"),
Banner = c("testing_Watermelon -DPI_300x250 v2" , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)
dom$label <- ifelse(grepl("watermelon", dom$Banner, ignore.case = T), "Watermelon",
ifelse(grepl("vanilla", dom$Banner, ignore.case = T), "Vanilla", "Default"))
Upvotes: 0
Reputation: 145755
library(stringr)
dom$label = str_extract(dom$Banner, "Watermelon|Vanilla")
dom$label[is.na(dom$label)] <- "Default"
dom
# Site Banner label
# 1 alpha testing_Watermelon -DPI_300x250 v2 Watermelon
# 2 beta notest_Vanilla Latte-DPI_300x250 v2 Vanilla
# 3 charlie bottle :15s Default
# 4 delta aaaa vvvv cccc Build_Mobile_320x480 Default
Upvotes: 5