Anjo
Anjo

Reputation: 175

Dump rows of a CSV file which contain a sequence of blank fields

I am trying to write a python program to clean survey data coming from a CSV file. I would like to dump rows which contain a sequence of blank fields, like the first and the third line in the following example.

"1","a","b","c",,,,,
"2","a","b","c","d","e","f",,"h"
"3","a","b","c",,,,,
"4","a","z","u","d","i","f","x","h"
"5","d","c","c",,"c","f","g","z"

Following my unsuccessful code:

import csv

fname = raw_input("Enter input file name: ")
if len(fname) < 1 : fname = "survey.csv"

foutput = raw_input("Enter output file name: ")
if len(foutput) < 1 : foutput = "output_"+fname


input = open(fname, 'rb')
output = open(foutput, 'wb')


searchFor = 5*['']

writer = csv.writer(output)

for row in csv.reader(input):
    if searchFor not in row :
        writer.writerow(row)

input.close()
output.close()

Upvotes: 1

Views: 88

Answers (2)

Learner
Learner

Reputation: 5292

Use counter to check if one list is subset of another as below. If you want to remove empty elements then just use None, bool or lento filter blanks and discard them-

import csv
from itertools import repeat
from collections import Counter
input = open(fname, 'rb')
output = open(foutput, 'wb')

writer = csv.writer(output)
#Helper function
def counterSubset(list1, list2):
    c1, c2 = Counter(list1), Counter(list2)
    for k, n in c1.items():
        if n > c2[k]:
            return False
    return True
for row in csv.reader(input):
    if not counterSubset(list(repeat('',5)),row):# i used 5 for five '' you can change it
        writer.writerow(row)#use filter(None,row) or filter(bool,row) or filter(len,row) to remove empty elements
input.close()
output.close()

Output-

1,a,b,c,,
2,a,b,c,d,e,f,g,h
4,a,,z,u,d,i,f,x,h
5,d,c,c,d,c,f,g,z

Upvotes: 1

timlyo
timlyo

Reputation: 2384

How about

# change this to whatever a blank item is from the csv reader
# probably "" or None
blank_item = None

for row in csv.reader(input):
    # filter out all blank elements
    blanks = [x for x in row if x == blank_item]
    if len(blanks) < 5:
        writer.writerow(row)

This will count the number of blanks in a row and let you drop them as desired.

Upvotes: 0

Related Questions