TMC
TMC

Reputation: 8154

Regex + Python to remove specific trailing and ending characters from value in tab delimited file

It's been years (and years) since I've done any regex, so turning to experts on here since it's likely a trivial exercise :)

I have a tab delimited file and on each line I have a certain fields that have values such as:

(A complete line in the file might be something like:

123\t b'bar foo' \tabc\t123\r\n

I want to get rid of all the leading b', b" and trailing ", ' from that field on every line. So given the example line above, after running the regex, I'd get:

123\t bar foo \tabc\t123\r\n

Bonus points if you can give me the python blurb to run this over the file.

Upvotes: 0

Views: 1859

Answers (3)

Aaron
Aaron

Reputation: 1072

(^|\t)b[\"'] should match the leadings, and for the trailing:

\"' should do it

In Python, you do:

import re
r1 = re.compile("(^|\t)b[\"']")
r2 = re.compile("[\"'](\t|$)")

then just use

r1.sub("\\1", yourString)
r2.sub("\\1", yourString)

Upvotes: 1

ghostdog74
ghostdog74

Reputation: 342383

>>> "b\"foo's bar\"".replace('b"',"").replace("b'","").rstrip("\"'")
"foo's bar"
>>> "b'bar foo'".replace('b"',"").replace("b'","").rstrip("\"'")
'bar foo'
>>>

Upvotes: 0

cobbal
cobbal

Reputation: 70743

for each line you can use

re.sub(r'''(?<![^\t\n])\W*b(["'])(.*)\1\W*(?![^\t\n])''', r'\2', line)

and for bonus points:

import re

pattern = re.compile(r'''(?<![^\t\n])\W*b(["'])(.*?)\1\W*?(?![^\t\n])''')
with open('outfile', 'w') as outfile:
    for line in open('infile'):
        outfile.write(pattern.sub(r'\2', line))

Upvotes: 1

Related Questions