Joe
Joe

Reputation: 989

Check if file has a CSV format with Python

Could someone provide an effective way to check if a file has CSV format using Python ?

Upvotes: 40

Views: 48250

Answers (5)

domenukk
domenukk

Reputation: 963

Adding to the answer by gotgenes: I reached good results with also checking for non-printable characters that should(tm) not be included in csv files.

def is_csv(infile):
    try:
        with open(infile, newline='') as csvfile:
            start = csvfile.read(4096)

            # isprintable does not allow newlines, printable does not allow umlauts...
            if not all([c in string.printable or c.isprintable() for c in start]):
                return False
            dialect = csv.Sniffer().sniff(start)
            return True
    except csv.Error:
        # Could not get a csv dialect -> probably not a csv.
        return False

Upvotes: 3

Hugh Bothwell
Hugh Bothwell

Reputation: 56654

Try parsing it as CSV and see if you get an error.

Upvotes: -1

gotgenes
gotgenes

Reputation: 40029

You could try something like the following, but just because you get a dialect back from csv.Sniffer really won't be sufficient for guaranteeing you have a valid CSV document.

csv_fileh = open(somefile, 'rb')
try:
    dialect = csv.Sniffer().sniff(csv_fileh.read(1024))
    # Perform various checks on the dialect (e.g., lineseparator,
    # delimiter) to make sure it's sane

    # Don't forget to reset the read position back to the start of
    # the file before reading any entries.
    csv_fileh.seek(0)
except csv.Error:
    # File appears not to be in CSV format; move along

Upvotes: 39

Assaf Lavie
Assaf Lavie

Reputation: 76053

You need to think clearly on what you consider a CSV file to be.

For example, what sort of characters can occur between the commas. Is it text-only? Can it be Unicode characters as well? Should every line have the same number of commas?

There is no strict definition of a CSV file that I'm aware of. Usually it's ASCII text separated by commas and every line has the same number of commas and is terminated by your platform's line terminator.

Anyway, once you answer the questions above you'll be a bit farther on your way to knowing how to detect when a file is a CSV file.

Upvotes: -2

Douglas Leeder
Douglas Leeder

Reputation: 53320

Python has a csv module, so you could try parsing it under a variety of different dialects.

Upvotes: 1

Related Questions