Greg
Greg

Reputation: 43

Using regex in python to remove double quotes with exclusions

I'm trying to remove specific double quotes from text using regular expression in python. I would like to leave only those double quotes which indicate an inch. So this would mean leave any double quote following a number.

txt = 'measurement 1/2" and 3" "remove" end" a " multiple"""

Expected output: measurement 1/2" and 3" remove end a multiple

This is the closest I've got.

re.sub(r'[^(?!\d+/\d+")]"+', '', txt)

Upvotes: 1

Views: 199

Answers (1)

Jan
Jan

Reputation: 43169

Simply use

(?<!\d)"+

See a demo on regex101.com.


Your original expression

[^(?!\d+/\d+")]

basically meant not (, ?, !, etc.


Alternatively, you could use the newer regex module with (*SKIP)(*FAIL):

import regex as re

junk = '''measurement 1/2" and 3" "remove" end" a " multiple"""
ABC2DEF3"'''

rx = re.compile(r'\b\d(?:/\d+)?"(*SKIP)(*FAIL)|"+')

cleaned = rx.sub('', junk)
print(cleaned)

Which would yield

measurement 1/2" and 3" remove end a  multiple
ABC2DEF3

Upvotes: 2

Related Questions