Reputation: 1843
I have a tab-delimited text file that may have some values containing newlines, like this:
col1 col2 col3
row1 val1 "Some text
containing newlines. Yup, possibly
more than one..." val3
row2 val4 "val5" val6
Note:
I am trying to write a small Python script using re
in order to:
It would be great to have it in a form like that:
def normalize_format(data, delimiter = '\t'):
data = re.sub(_DESIRED_REGEX_, r'"\1"', data)
return data
where data
is the whole file contents as a single string and _DESIRED_REGEX_
is the one I would like to have figured out
Usage of re
is not mandatory, but short and elegant solution is appreciated :)
Upvotes: 2
Views: 159
Reputation: 336108
You should be using the csv
module instead:
import csv
with open("mycsv.csv", "rb") as infile, open("newcsv.csv", "wb") as outfile:
reader = csv.reader(infile, delimiter="\t")
writer = csv.writer(outfile, delimiter="\t", quoting=csv.QUOTE_ALL)
# Now you can remove all the newlines within fields
# and write them back to a new CSV file:
for row in reader:
writer.writerow([field.replace("\n", " ") for field in row])
Upvotes: 2