Reputation: 17
Good day, I want to gsub all string with " " except INDIVIDUAL/BUSINESS then mutate in a new column called business_type. I've tried many methods but fail. Thanks in advance.
text <- c("|Name:James Indiana|type:INDIVIDUAL|Id::G123456789&M|Location:Indonesia|", "|Name:James Bond|type:BUSINESS|Id::G&987654321M|Location:Indonesia|")
The output will be like this
business_type
INDIVIDUAL
BUSINESS
I am using
mutate(business_type = gsub("[^(\\bINDIVIDUAL\\b)(\\bBUSINESS\\b)]+"," ",x)
This method removes other strings but exclude some uppercase letter from other strings.
mutate(business_type = gsub("^/(?!INDIVIDUAL$)(?!BUSINESS$)[a-z0-9A-Z:&|]+=$"," ",x)
does not either. I also try ^/(?!ignoreme)([a-z0-9]+)$
regex but it's not working.
Upvotes: 1
Views: 350
Reputation: 627537
You can use
mutate(business_type = gsub("\\b(?:INDIVIDUAL|BUSINESS)\\b(*SKIP)(*F)|(?s)."," ",x, perl=TRUE)
See the regex demo.
Regex details:
\b(?:INDIVIDUAL|BUSINESS)\b
- match either an INDIVIDUAL
or BUSINESS
as whole words and(*SKIP)(*F)
- skip the match and go on matching from the failure location|
- or(?s).
- match any char including line break chars ((?s)
is a singleline flag that makes .
match any chars in a PCRE regex).Upvotes: 2
Reputation: 389325
You can use str_extract
to extract the words that you are interested in.
stringr::str_extract(text, 'INDIVIDUAL|BUSINESS')
#[1] "INDIVIDUAL" "BUSINESS"
In base R,
regmatches(text, regexpr('INDIVIDUAL|BUSINESS', text))
Upvotes: 2