donalmg
donalmg

Reputation: 655

Python: split files using multiple split delimiters

I have multiple CSV files which I need to parse in a loop to gather information. The problem is that while they are the same format, some are delimited by '\t' and others by ','. After this, I want to remove the double-quote from around the string.

Can python split via multiple possible delimiters?

At the minute, I can split the line with one by using:

f = open(filename, "r")
fields = f.readlines()
for fs in fields:
    sf = fs.split('\t')
    tf = [fi.strip ('"') for fi in sf]

Upvotes: 2

Views: 8251

Answers (2)

interjay
interjay

Reputation: 110191

Splitting the file like that is not a good idea: It will fail if there is a comma within one of the fields. For example (for a tab-delimited file): The line "field1"\t"Hello, world"\t"field3" will be split into 4 fields instead of 3.

Instead, you should use the csv module. It contains the helpful Sniffer class which can detect which delimiters are used in the file. The csv module will also remove the double-quotes for you.

import csv

csvfile = open("example.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)

for line in reader:
    #process line

Upvotes: 14

Matthew Flaschen
Matthew Flaschen

Reputation: 285047

You can do this with regex (optionally compiled):

sf = re.split(r'[,\t]', fs)

This doesn't account for e.g. commas inside tab-delimited fields. I would see if the csv module is helpful.

Upvotes: 2

Related Questions