Reputation: 2899
I have a large .csv
file and I want to processes it with, perhaps a python script, and find all the values that are "malformed", e.g. those that have more or less values than the number of headers, and eliminate them.
What's the best way to do this?
Upvotes: 2
Views: 980
Reputation: 1077
Here's a basic example:
num_headers = 5
with open("input.csv", 'r') as file_in, open("output.csv", 'w') as file_out:
for i, line in enumerate(file_in):
if len(line.split(",")) == num_headers:
file_out.write(line)
else:
print "line %d is malformed" % i
Or using the csv module (which is more flexible for different types of CSV formatting):
import csv
num_headers = 5
with open("input.csv", 'r') as file_in, open("output.csv", 'w') as file_out:
csv_in = csv.reader(file_in)
csv_out = csv.writer(file_out)
for i, row in enumerate(csv_in):
if len(row) == num_headers:
csv_out.writerow(row)
else:
print "line %d is malformed" % i
Upvotes: 3