Reputation: 468
I have a query in which I want to wrap every word in double quotes by ignoring certain attributes, but I also want to ignore the words which are already in double quotes.
I am ignoring has, from, to, sample etc but not able to ignore words in double quotes.
Could anyone please nudge me in the right direction ?
Current regex -
\b(?!\bOR\b)\b(?!\bAND\b)\b(?!\bfrom:\b)\b(?!\bto:\b)\b(?!\bhas:\b)\b(?!\bsample\b)\w+\b
Query -
(@harrys OR from:harrys OR to:harrys OR ("harry's" OR harrys) AND (razor OR razors OR shave OR shaving OR shaved OR shaver OR subscription OR razorhead OR razorheads OR buy OR bought OR buying OR boxers OR cover) AND (has:geo OR has:profile_geo) -styles -prince -markle -meghanmarkle)
Upvotes: 1
Views: 139
Reputation: 626758
You can match and capture all your exceptions, and just match your expected matches, then, when replacing, check if Group 1 participated in the match, and replace accordingly.
Here is what I mean:
import re
text = r"""(@harrys OR from:harrys OR to:harrys OR ("harry's" OR harrys) AND (razor OR razors OR shave OR shaving OR shaved OR shaver OR subscription OR razorhead OR razorheads OR buy OR bought OR buying OR boxers OR cover) AND (has:geo OR has:profile_geo) -styles -prince -markle -meghanmarkle)"""
pattern = r'("[^"]*"|\b(?:(?:OR|AND|sample)\b|(?:from|to|has):))|\w+'
print( re.sub(pattern, lambda m: f'"{m.group(1)}"' if m.group(1) else m.group(), text) )
Output:
(@harrys "OR" "from:"harrys "OR" "to:"harrys "OR" (""harry's"" "OR" harrys) "AND" (razor "OR" razors "OR" shave "OR" shaving "OR" shaved "OR" shaver "OR" subscription "OR" razorhead "OR" razorheads "OR" buy "OR" bought "OR" buying "OR" boxers "OR" cover) "AND" ("has:"geo "OR" "has:"profile_geo) -styles -prince -markle -mehanmarkle)
See the Python demo. See also the regex demo (all green matches are kept, all blue matches are enclosed with double quotes).
Regex details:
("[^"]*"|\b(?:(?:OR|AND|sample)\b|(?:from|to|has):))
- Group 1 (this text is kept as is, exceptions):
"[^"]*"
- "
, zero or more chars other than "
, and a "
char|
- or\b
- a word boundary(?:
- start of the non-capturing group
(?:OR|AND|sample)\b
- OR
, AND
, sample
and a word boundary|
- or(?:from|to|has):
- from
, to
, has
and a colon)
- end of the non-capturing group|
- or\w+
- one or more word chars.Upvotes: 1