Reputation: 19

how to remove entire row that has empty cells in csv file using python

First off, I do not have pandas framework and am unable to install it. I am hoping that I can solve this problem without pandas.

I am trying to clean my data using python framework, by removing rows that contain empty cells.

this is my code:

import csv
input_file = 'test.csv'
output_file = 'test1.csv'

cols_to_remove =[0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49] 


cols_to_remove = sorted(cols_to_remove, reverse=True) 
row_count = 0 

with open(input_file, "r") as source: #to run and delete column  
   reader = csv.reader(source)  
   with open(output_file, "w") as result:
       writer = csv.writer(result)
       for row in reader:
           row_count += 1
           print('\r{0}'.format(row_count)) # Print rows processed
           for col_index in cols_to_remove:
               del row[col_index]
           writer.writerow(row)
           print(row)

I have tried codes from other similar questions asked, however it prints into an empty file.

Upvotes: 0

Answers (4)

Abhi_J

Reputation: 2129

Checking length of row as someone suggested might not work as empty rows in csv may contain empty Strings ''. So len(row) wouldn't return 0 because it is a list of empty strings.

To simply delete empty rows in csv try adding the following check

skip_row = True
for item in row:
    if item != None and item != '':
        skip_row = False
if not skip_row:
   # process row

Full code would be

import csv
input_file = 'test.csv'
output_file = 'test1.csv'

cols_to_remove = [0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49] 


cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0

with open(input_file, "r") as source: #to run and delete column
   reader = csv.reader(source)
   with open(output_file, "w", newline='' ) as result:
       writer = csv.writer(result)
       for row in reader:
            row_count += 1
            print('\r{0}'.format(row_count)) # Print rows processed
            skip_row = True
            for item in row:
                if item != None and item != '':
                    skip_row = False
            if not skip_row:
                for col_index in cols_to_remove:
                    del row[col_index]
                writer.writerow(row)
                print(row)

Here we check if each element of a row is None type or an empty string and decide if we should skip that row or not.

Upvotes: 1

Vova

Reputation: 3543

try to skip row if any cell empty like this:

if any(cel is None or cel == '' for cel in row):
   continue

Here's the code:

import csv

input_file = 'hey.txt'
output_file = 'test1.csv'
cols_to_remove = [0, 1, 9, 11, 14, 15, 23, 28, 29, 32, 33, 37, 38]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0

with open(input_file, "r+") as source:  # to run and delete column
    reader = csv.reader(source)
    with open(output_file, "w+") as result:
        writer = csv.writer(result)
        for row in reader:
            row_count += 1
            print('\r{0}'.format(row_count))  # Print rows processed
            if any(cel is None or cel == '' for cel in row):
                continue
            for col_index in cols_to_remove:
                try:
                    del row[col_index]
                except Exception:
                    pass
            writer.writerow(row)
            print(row)

files and cols_to_remove were changed, please, use your own. step of skipping might be moving after row deleting.

Upvotes: 1

C Hecht

Reputation: 1016

Assuming that the empty row is in fact one that looks like this

,,,,,,,,,,,,,, (and many more)

you can do the following:

import csv
input_file = 'test.csv'
output_file = 'test1.csv'

cols_to_remove =[0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49] 


cols_to_remove = sorted(cols_to_remove, reverse=True) 
row_count = 0 

with open(input_file, "r") as source: #to run and delete column  
   reader = csv.reader(source)  
   with open(output_file, "w") as result:
       writer = csv.writer(result)
       for row in reader:
           row_count += 1
           print('\r{0}'.format(row_count)) # Print rows processed
           all_empty = False # new
           for cell in row: # new
               if len(cell) == 0: # new
                   all_empty = True # new
                   break# new
           if all_empty: # new
               continue # new
           for col_index in cols_to_remove:
               del row[col_index]
           writer.writerow(row)
           print(row)

Upvotes: 1

Prima

Reputation: 70

you can use this for delete all the row that have a null value,

data = data.dropna()

and this one for delete column,

data = data.drop(['column'], axis=1)

Upvotes: 0

how to remove entire row that has empty cells in csv file using python

Answers (4)

Related Questions