Reputation: 8154
It's been years (and years) since I've done any regex, so turning to experts on here since it's likely a trivial exercise :)
I have a tab delimited file and on each line I have a certain fields that have values such as:
(A complete line in the file might be something like:
123\t b'bar foo' \tabc\t123\r\n
I want to get rid of all the leading b', b" and trailing ", ' from that field on every line. So given the example line above, after running the regex, I'd get:
123\t bar foo \tabc\t123\r\n
Bonus points if you can give me the python blurb to run this over the file.
Upvotes: 0
Views: 1859
Reputation: 1072
(^|\t)b[\"'] should match the leadings, and for the trailing:
\"' should do it
In Python, you do:
import re
r1 = re.compile("(^|\t)b[\"']")
r2 = re.compile("[\"'](\t|$)")
then just use
r1.sub("\\1", yourString)
r2.sub("\\1", yourString)
Upvotes: 1
Reputation: 342383
>>> "b\"foo's bar\"".replace('b"',"").replace("b'","").rstrip("\"'")
"foo's bar"
>>> "b'bar foo'".replace('b"',"").replace("b'","").rstrip("\"'")
'bar foo'
>>>
Upvotes: 0
Reputation: 70743
for each line you can use
re.sub(r'''(?<![^\t\n])\W*b(["'])(.*)\1\W*(?![^\t\n])''', r'\2', line)
and for bonus points:
import re
pattern = re.compile(r'''(?<![^\t\n])\W*b(["'])(.*?)\1\W*?(?![^\t\n])''')
with open('outfile', 'w') as outfile:
for line in open('infile'):
outfile.write(pattern.sub(r'\2', line))
Upvotes: 1