Reputation: 175
I am trying to write a python program to clean survey data coming from a CSV file. I would like to dump rows which contain a sequence of blank fields, like the first and the third line in the following example.
"1","a","b","c",,,,,
"2","a","b","c","d","e","f",,"h"
"3","a","b","c",,,,,
"4","a","z","u","d","i","f","x","h"
"5","d","c","c",,"c","f","g","z"
Following my unsuccessful code:
import csv
fname = raw_input("Enter input file name: ")
if len(fname) < 1 : fname = "survey.csv"
foutput = raw_input("Enter output file name: ")
if len(foutput) < 1 : foutput = "output_"+fname
input = open(fname, 'rb')
output = open(foutput, 'wb')
searchFor = 5*['']
writer = csv.writer(output)
for row in csv.reader(input):
if searchFor not in row :
writer.writerow(row)
input.close()
output.close()
Upvotes: 1
Views: 88
Reputation: 5292
Use counter
to check if one list is subset of another as below. If you want to remove empty elements then just use None
, bool
or len
to filter blanks and discard them-
import csv
from itertools import repeat
from collections import Counter
input = open(fname, 'rb')
output = open(foutput, 'wb')
writer = csv.writer(output)
#Helper function
def counterSubset(list1, list2):
c1, c2 = Counter(list1), Counter(list2)
for k, n in c1.items():
if n > c2[k]:
return False
return True
for row in csv.reader(input):
if not counterSubset(list(repeat('',5)),row):# i used 5 for five '' you can change it
writer.writerow(row)#use filter(None,row) or filter(bool,row) or filter(len,row) to remove empty elements
input.close()
output.close()
Output-
1,a,b,c,,
2,a,b,c,d,e,f,g,h
4,a,,z,u,d,i,f,x,h
5,d,c,c,d,c,f,g,z
Upvotes: 1
Reputation: 2384
How about
# change this to whatever a blank item is from the csv reader
# probably "" or None
blank_item = None
for row in csv.reader(input):
# filter out all blank elements
blanks = [x for x in row if x == blank_item]
if len(blanks) < 5:
writer.writerow(row)
This will count the number of blanks in a row and let you drop them as desired.
Upvotes: 0