AdilK
AdilK

Reputation: 707

Use str_detect function to conditionally create a new column in R dataframe?

I have a dataframe with Column A containing values:

**Channel**
Direct
Paid social
Organic social

What I want to do: Create a new column called groupedChannel where str_detect searches for string in Column A to add a value in groupedChannel.

Condition:
IF row in Column A matches regex "direct" THEN Column B value = "Direct" ELSE
IF row in Column B matches regex "social" THEN Column B value = "Social"

AFAIK, str_detect will return only TRUE/FALSE. How can I use the TRUE/FALSE to assign a value in column B?

Upvotes: 0

Views: 3622

Answers (4)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Here's a base R solution, which assumes you have a clearly defined set of Channel_group values

Data:

data <- data.frame(Channel = c("Direct", "Paid social", "Organic social"),
                   stringsAsFactors = F)

You can define your Channel_group values in a vector a:

a <- c("(S|s)ocial", "(D|d)irect")

Now you use sub to substitute the Channel values by the Channel_group values; \\U makes sure that these values are returned as upper-case strings (use \\L if you prefer to have lower-case strings):

data$Channel_group <- sub(paste0(".*\\b(", paste(a, collapse = "|"),")\\b.*"), "\\U\\1", data$Channel, perl = T)

Result:

data
         Channel Channel_group
1         Direct        DIRECT
2    Paid social        SOCIAL
3 Organic social        SOCIAL

Upvotes: 0

Brian
Brian

Reputation: 8275

What you want is to match your regex, not simply detect.

library(dplyr)
library(stringr)

tibble(
  colA = c("**Channel**", "Direct", "Paid social", "Organic social")
) %>% 
  mutate(
    colB = str_match(colA, "[Ss]ocial|[Dd]irect")[,1],
    colB = str_to_lower(colB)
  )
#> # A tibble: 4 x 2
#>   colA           colB  
#>   <chr>          <chr> 
#> 1 **Channel**    <NA>  
#> 2 Direct         direct
#> 3 Paid social    social
#> 4 Organic social social

Created on 2020-04-29 by the reprex package (v0.3.0)

stringr::str_match returns a matrix, where the first column is the match itself, and subsequent columns for multiple groups, so we need to put [,1] at the end of that call. Then it matches both upper and lower case versions, so we convert all the matched groups to lowercase.

Alternatively, you could use str_extract like so: colB = str_extract(colA, "[Ss]ocial|[Dd]irect"), without the [,1].

Upvotes: 0

Jamie_B
Jamie_B

Reputation: 299

Solution using base R regex functions, also handles when direct and social are not found in Channel column

# Dummy data
data <- data.frame(Channel = c("Direct Paid", "Social", "Organic", "Social Organic"),
                   stringsAsFactors = F)

# Use sapply to iterate through each value in the 'Channel' column in the above dataframe
data$groupChannel <- sapply(data$Channel, FUN = function(x){
  # Use base R regex functions to for conditions, and return values for new column
  if (grepl("direct", tolower(x))){
    return("Direct")
  }else if (grepl("social", tolower(x))){
    return("Social")
  }else{
    return("Direct or Social Not Found")
  }
})

head(data)
  Channel               groupChannel
1    Direct Paid                     Direct
2         Social                     Social
3        Organic Direct or Social Not Found
4 Social Organic                     Social

Upvotes: 1

linog
linog

Reputation: 6226

I have a data.table solution based on conditional replacement. It uses grepl but you could use stringr::str_detect if you want:

library(data.table)
setDT(df)
df[, groupedChannel := "Social"]

# Conditional replacement
df[grepl("direct",colA), groupedChannel := "Direct"]

(solution is untested)

Upvotes: 1

Related Questions