smatthewenglish
smatthewenglish

Reputation: 2899

eliminate malformed records from a large .csv file

I have a large .csv file and I want to processes it with, perhaps a python script, and find all the values that are "malformed", e.g. those that have more or less values than the number of headers, and eliminate them.

What's the best way to do this?

Upvotes: 2

Views: 980

Answers (1)

tjohnson
tjohnson

Reputation: 1077

Here's a basic example:

num_headers = 5
with open("input.csv", 'r') as file_in, open("output.csv", 'w') as file_out:
    for i, line in enumerate(file_in):
        if len(line.split(",")) == num_headers:
            file_out.write(line)
        else:
            print "line %d is malformed" % i

Or using the csv module (which is more flexible for different types of CSV formatting):

import csv
num_headers = 5
with open("input.csv", 'r') as file_in, open("output.csv", 'w') as file_out:
    csv_in = csv.reader(file_in)
    csv_out = csv.writer(file_out)
    for i, row in enumerate(csv_in):
        if len(row) == num_headers:
            csv_out.writerow(row)
        else:
            print "line %d is malformed" % i

Upvotes: 3

Related Questions