Reputation: 55
I would like to know how to replace strings based on different conditions then group them together with dplyr in a dataset.
For example,
The reason I treat FRAUD and NARC differently is that I think there is a difference between NARC-SELL and NARC-POSSES (the kinds of drugs that are involved are not important).
Thanks for the help!
Upvotes: 0
Views: 57
Reputation: 21284
You can also use str_extract()
, from stringr
:
# using Weihuang Wong's nice example data
library(dplyr)
library(stringr)
d <- data.frame(x = c("FRAUD-CREDIT CARD",
"HOMICIDE-JUST-GUN",
"NARC-POSSESS-PILL/TABLET",
"NARC-SELL-HEROIN"))
pattern <- "^(NARC-\\w+|FRAUD|HOMICIDE-\\w+-\\w+)"
d %>% mutate(y = str_extract(x, pattern))
x y
1 FRAUD-CREDIT CARD FRAUD
2 HOMICIDE-JUST-GUN HOMICIDE-JUST-GUN
3 NARC-POSSESS-PILL/TABLET NARC-POSSESS
4 NARC-SELL-HEROIN NARC-SELL
Upvotes: 0
Reputation: 13128
You'll want to use a regex string like NARC-[A-Z]*|FRAUD
: NARC
followed by a dash followed by a string of capital letters, or FRAUD
.
library(dplyr)
d <- data.frame(x = c("FRAUD-CREDIT CARD",
"HOMICIDE-JUST-GUN",
"NARC-POSSESS-PILL/TABLET",
"NARC-SELL-HEROIN"))
d %>%
mutate(y = gsub("^(NARC-[A-Z]+|FRAUD).*", "\\1", x))
# x y
# 1 FRAUD-CREDIT CARD FRAUD
# 2 HOMICIDE-JUST-GUN HOMICIDE-JUST-GUN
# 3 NARC-POSSESS-PILL/TABLET NARC-POSSESS
# 4 NARC-SELL-HEROIN NARC-SELL
Upvotes: 3