SNT
SNT

Reputation: 1393

Creating new column based on string values from another column

I have a dataframe in r with a column which is a big string. I want to use that string to create a new column with specific values.

This is the sample dataframe:

dom <- data.frame(
  Site = c("alpha", "beta", "charlie", "delta"),
  Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)

Now if the column Banner has string containing Watermelon or Vanilla then the new column label should have values only Watermelon or Vanilla else Default. Below is what the expected dataframe should be like.

How can I use grep or anything else to have multiple conditions in that?

dom_output <- data.frame(
  Site = c("alpha", "beta", "charlie", "delta"),
  Banner = c("testing_Watermelon -bbb_300x250 v2"   , "notest_Orange aaa_300x250 v2"    , "bottle :15s","aaaa vvvv cccc 320x480"),
  label  = c("Watermelon","Vanilla","Default","Default")
)

Upvotes: 3

Views: 5137

Answers (4)

M--
M--

Reputation: 28826

library(dplyr)
library(stringi)

dom %>% mutate(label = case_when(stri_detect_fixed(Banner, "Watermelon") ~ "Watermelon",
                                 stri_detect_fixed(Banner, "Vanilla")    ~ "Vanilla",
                                                                   TRUE  ~ "Default"))
#>      Site                              Banner          label
#> 1   alpha  testing_Watermelon -DPI_300x250 v2     Watermelon
#> 2    beta notest_Vanilla Latte-DPI_300x250 v2        Vanilla
#> 3 charlie                         bottle :15s        Default
#> 4   delta aaaa vvvv cccc Build_Mobile_320x480        Default

Data:

dom <- data.frame(Site = c("alpha", "beta", "charlie", "delta"),
                  Banner = c("testing_Watermelon -DPI_300x250 v2",
                             "notest_Vanilla Latte-DPI_300x250 v2",
                             "bottle :15s",
                             "aaaa vvvv cccc Build_Mobile_320x480"))

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 39858

One base R possibility could be:

labels <- paste(c("Watermelon", "Orange"), collapse = "|")

dom$label <- sapply(regmatches(dom$Banner, regexec(labels, dom$Banner)), "[", 1)
dom$label[is.na(dom$label)] <- "Default"

     Site                              Banner      label
1   alpha  testing_Watermelon -DPI_300x250 v2 Watermelon
2    beta  notest_Orange Latte-DPI_300x250 v2     Orange
3 charlie                         bottle :15s    Default
4   delta aaaa vvvv cccc Build_Mobile_320x480    Default

The same could be used also by dplyr and tidyr:

dom %>%
 mutate(label = sapply(regmatches(Banner, regexec(labels, Banner)), "[", 1),
        label = replace_na(label, "Default"))

Sample data:

dom <- data.frame(
 Site = c("alpha", "beta", "charlie", "delta"),
 Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Orange Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)

Upvotes: 0

Brigadeiro
Brigadeiro

Reputation: 2945

Here's a simple solution using Base R:

#Sample data:
dom <- data.frame(
  Site = c("alpha", "beta", "charlie", "delta"),
  Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)


dom$label <- ifelse(grepl("watermelon", dom$Banner, ignore.case = T), "Watermelon",
                    ifelse(grepl("vanilla", dom$Banner, ignore.case = T), "Vanilla", "Default"))

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 145755

library(stringr)
dom$label = str_extract(dom$Banner, "Watermelon|Vanilla")
dom$label[is.na(dom$label)] <- "Default"
dom
#      Site                              Banner      label
# 1   alpha  testing_Watermelon -DPI_300x250 v2 Watermelon
# 2    beta notest_Vanilla Latte-DPI_300x250 v2    Vanilla
# 3 charlie                         bottle :15s    Default
# 4   delta aaaa vvvv cccc Build_Mobile_320x480    Default

Upvotes: 5

Related Questions