Reputation: 707
I have a dataframe with Column A containing values:
**Channel**
Direct
Paid social
Organic social
What I want to do: Create a new column called groupedChannel where str_detect searches for string in Column A to add a value in groupedChannel.
Condition:
IF row in Column A matches regex "direct" THEN Column B value = "Direct" ELSE
IF row in Column B matches regex "social" THEN Column B value = "Social"
AFAIK, str_detect will return only TRUE/FALSE. How can I use the TRUE/FALSE to assign a value in column B?
Upvotes: 0
Views: 3622
Reputation: 21400
Here's a base R
solution, which assumes you have a clearly defined set of Channel_group
values
Data:
data <- data.frame(Channel = c("Direct", "Paid social", "Organic social"),
stringsAsFactors = F)
You can define your Channel_group
values in a vector a
:
a <- c("(S|s)ocial", "(D|d)irect")
Now you use sub
to substitute the Channel
values by the Channel_group
values; \\U
makes sure that these values are returned as upper-case strings (use \\L
if you prefer to have lower-case strings):
data$Channel_group <- sub(paste0(".*\\b(", paste(a, collapse = "|"),")\\b.*"), "\\U\\1", data$Channel, perl = T)
Result:
data
Channel Channel_group
1 Direct DIRECT
2 Paid social SOCIAL
3 Organic social SOCIAL
Upvotes: 0
Reputation: 8275
What you want is to match your regex, not simply detect.
library(dplyr)
library(stringr)
tibble(
colA = c("**Channel**", "Direct", "Paid social", "Organic social")
) %>%
mutate(
colB = str_match(colA, "[Ss]ocial|[Dd]irect")[,1],
colB = str_to_lower(colB)
)
#> # A tibble: 4 x 2
#> colA colB
#> <chr> <chr>
#> 1 **Channel** <NA>
#> 2 Direct direct
#> 3 Paid social social
#> 4 Organic social social
Created on 2020-04-29 by the reprex package (v0.3.0)
stringr::str_match
returns a matrix, where the first column is the match itself, and subsequent columns for multiple groups, so we need to put [,1]
at the end of that call. Then it matches both upper and lower case versions, so we convert all the matched groups to lowercase.
Alternatively, you could use str_extract
like so: colB = str_extract(colA, "[Ss]ocial|[Dd]irect"),
without the [,1]
.
Upvotes: 0
Reputation: 299
Solution using base R regex functions, also handles when direct and social are not found in Channel column
# Dummy data
data <- data.frame(Channel = c("Direct Paid", "Social", "Organic", "Social Organic"),
stringsAsFactors = F)
# Use sapply to iterate through each value in the 'Channel' column in the above dataframe
data$groupChannel <- sapply(data$Channel, FUN = function(x){
# Use base R regex functions to for conditions, and return values for new column
if (grepl("direct", tolower(x))){
return("Direct")
}else if (grepl("social", tolower(x))){
return("Social")
}else{
return("Direct or Social Not Found")
}
})
head(data)
Channel groupChannel
1 Direct Paid Direct
2 Social Social
3 Organic Direct or Social Not Found
4 Social Organic Social
Upvotes: 1
Reputation: 6226
I have a data.table
solution based on conditional replacement. It uses grepl
but you could use stringr::str_detect
if you want:
library(data.table)
setDT(df)
df[, groupedChannel := "Social"]
# Conditional replacement
df[grepl("direct",colA), groupedChannel := "Direct"]
(solution is untested)
Upvotes: 1