Reputation: 989
Could someone provide an effective way to check if a file has CSV format using Python ?
Upvotes: 40
Views: 48250
Reputation: 963
Adding to the answer by gotgenes: I reached good results with also checking for non-printable characters that should(tm) not be included in csv files.
def is_csv(infile):
try:
with open(infile, newline='') as csvfile:
start = csvfile.read(4096)
# isprintable does not allow newlines, printable does not allow umlauts...
if not all([c in string.printable or c.isprintable() for c in start]):
return False
dialect = csv.Sniffer().sniff(start)
return True
except csv.Error:
# Could not get a csv dialect -> probably not a csv.
return False
Upvotes: 3
Reputation: 40029
You could try something like the following, but just because you get a dialect back from csv.Sniffer
really won't be sufficient for guaranteeing you have a valid CSV document.
csv_fileh = open(somefile, 'rb')
try:
dialect = csv.Sniffer().sniff(csv_fileh.read(1024))
# Perform various checks on the dialect (e.g., lineseparator,
# delimiter) to make sure it's sane
# Don't forget to reset the read position back to the start of
# the file before reading any entries.
csv_fileh.seek(0)
except csv.Error:
# File appears not to be in CSV format; move along
Upvotes: 39
Reputation: 76053
You need to think clearly on what you consider a CSV file to be.
For example, what sort of characters can occur between the commas. Is it text-only? Can it be Unicode characters as well? Should every line have the same number of commas?
There is no strict definition of a CSV file that I'm aware of. Usually it's ASCII text separated by commas and every line has the same number of commas and is terminated by your platform's line terminator.
Anyway, once you answer the questions above you'll be a bit farther on your way to knowing how to detect when a file is a CSV file.
Upvotes: -2
Reputation: 53320
Python has a csv module, so you could try parsing it under a variety of different dialects.
Upvotes: 1