Chloe
Chloe

Reputation: 55

How do I extract strings based on certain conditions

I would like to know how to replace strings based on different conditions then group them together with dplyr in a dataset.
For example,

Discription on how I want to extract from the given dataset

The reason I treat FRAUD and NARC differently is that I think there is a difference between NARC-SELL and NARC-POSSES (the kinds of drugs that are involved are not important).
Thanks for the help!

Upvotes: 0

Views: 57

Answers (2)

andrew_reece
andrew_reece

Reputation: 21284

You can also use str_extract(), from stringr:

# using Weihuang Wong's nice example data

library(dplyr)
library(stringr)

d <- data.frame(x = c("FRAUD-CREDIT CARD",
                      "HOMICIDE-JUST-GUN",
                      "NARC-POSSESS-PILL/TABLET",
                      "NARC-SELL-HEROIN"))

pattern <- "^(NARC-\\w+|FRAUD|HOMICIDE-\\w+-\\w+)"

d %>% mutate(y = str_extract(x, pattern))

                         x                 y
1        FRAUD-CREDIT CARD             FRAUD
2        HOMICIDE-JUST-GUN HOMICIDE-JUST-GUN
3 NARC-POSSESS-PILL/TABLET      NARC-POSSESS
4         NARC-SELL-HEROIN         NARC-SELL

Upvotes: 0

Weihuang Wong
Weihuang Wong

Reputation: 13128

You'll want to use a regex string like NARC-[A-Z]*|FRAUD: NARC followed by a dash followed by a string of capital letters, or FRAUD.

library(dplyr)
d <- data.frame(x = c("FRAUD-CREDIT CARD",
                      "HOMICIDE-JUST-GUN",
                      "NARC-POSSESS-PILL/TABLET",
                      "NARC-SELL-HEROIN"))
d %>%
  mutate(y = gsub("^(NARC-[A-Z]+|FRAUD).*", "\\1",  x))
#                          x                 y
# 1        FRAUD-CREDIT CARD             FRAUD
# 2        HOMICIDE-JUST-GUN HOMICIDE-JUST-GUN
# 3 NARC-POSSESS-PILL/TABLET      NARC-POSSESS
# 4         NARC-SELL-HEROIN         NARC-SELL

Upvotes: 3

Related Questions