Afiq
Afiq

Reputation: 17

Replace string with empty string except certain word using R

Good day, I want to gsub all string with " " except INDIVIDUAL/BUSINESS then mutate in a new column called business_type. I've tried many methods but fail. Thanks in advance.

text <- c("|Name:James Indiana|type:INDIVIDUAL|Id::G123456789&M|Location:Indonesia|", "|Name:James Bond|type:BUSINESS|Id::G&987654321M|Location:Indonesia|")

The output will be like this

business_type    
INDIVIDUAL    
BUSINESS

I am using

mutate(business_type = gsub("[^(\\bINDIVIDUAL\\b)(\\bBUSINESS\\b)]+"," ",x)

This method removes other strings but exclude some uppercase letter from other strings.

mutate(business_type = gsub("^/(?!INDIVIDUAL$)(?!BUSINESS$)[a-z0-9A-Z:&|]+=$"," ",x)

does not either. I also try ^/(?!ignoreme)([a-z0-9]+)$ regex but it's not working.

Upvotes: 1

Views: 350

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627537

You can use

mutate(business_type = gsub("\\b(?:INDIVIDUAL|BUSINESS)\\b(*SKIP)(*F)|(?s)."," ",x, perl=TRUE)

See the regex demo.

Regex details:

  • \b(?:INDIVIDUAL|BUSINESS)\b - match either an INDIVIDUAL or BUSINESS as whole words and
  • (*SKIP)(*F) - skip the match and go on matching from the failure location
  • | - or
  • (?s). - match any char including line break chars ((?s) is a singleline flag that makes . match any chars in a PCRE regex).

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389325

You can use str_extract to extract the words that you are interested in.

stringr::str_extract(text, 'INDIVIDUAL|BUSINESS')
#[1] "INDIVIDUAL" "BUSINESS" 

In base R,

regmatches(text, regexpr('INDIVIDUAL|BUSINESS', text))

Upvotes: 2

Related Questions