Replace tab in an enclosed string in a tab delimited file Python

Question

I have a file exported as TAB delimited that has some strings fields that have a TAB character in them, so that upon import it shifts the columns over. I've tried to find a few ways to do this with other tools (see replace tab in an enclosed string in a tab delimited file linux for instance for a solution using gawk) but would like to be able to do this from my Jupyter Notebook using python.

Sample Data:

"badstring"1"good string"2"also good""01/01/01"

Needs to become

"bad string"1"good string""also good""01/01/01"

I assume regex is the key but not proficient enough in that to pull that together quickly. Right now I am working with the idea to split on tabs then evaluate strings missing end and start quotes then threading those back together but there's some possible pitfalls with that method the way I have that now.

Any help would be appreciated. Thanks...JP

blhsing · Accepted Answer

Tabs in a field in a tab-delimited CSV are not bad as long as the field is properly quoted, which is the case here, so instead of replacing tabs with spaces, you can simply use csv.reader with the delimiter parameter set to ' ':

from io import StringIO
import csv
f = StringIO('''"bad	string"	1	"good string"	2	"also good"	"01/01/01"''')
print(list(csv.reader(f, delimiter='	')))

This outputs:

[['bad	string', '1', 'good string', '2', 'also good', '01/01/01']]

And if you still insist on replacing tabs with spaces you can then easily do that by replacing ' 's in the output generated by csv.reader.

print([[s.replace('	', ' ') for s in row] for row in csv.reader(f, delimiter='	')])

This outputs:

[['bad string', '1', 'good string', '2', 'also good', '01/01/01']]

with which you can use csv.writer.writerows to write back to a CSV if so desired.

Replace tab in an enclosed string in a tab delimited file Python

Answers (1)

Related Questions