Reputation: 43
I'm trying to remove specific double quotes from text using regular expression in python. I would like to leave only those double quotes which indicate an inch. So this would mean leave any double quote following a number.
txt = 'measurement 1/2" and 3" "remove" end" a " multiple"""
Expected output:
measurement 1/2" and 3" remove end a multiple
This is the closest I've got.
re.sub(r'[^(?!\d+/\d+")]"+', '', txt)
Upvotes: 1
Views: 199
Reputation: 43169
Simply use
(?<!\d)"+
[^(?!\d+/\d+")]
basically meant not (
, ?
, !
, etc.
regex
module with (*SKIP)(*FAIL)
:
import regex as re
junk = '''measurement 1/2" and 3" "remove" end" a " multiple"""
ABC2DEF3"'''
rx = re.compile(r'\b\d(?:/\d+)?"(*SKIP)(*FAIL)|"+')
cleaned = rx.sub('', junk)
print(cleaned)
Which would yield
measurement 1/2" and 3" remove end a multiple
ABC2DEF3
Upvotes: 2