Reputation: 19
First off, I do not have pandas framework and am unable to install it. I am hoping that I can solve this problem without pandas.
I am trying to clean my data using python framework, by removing rows that contain empty cells.
this is my code:
import csv
input_file = 'test.csv'
output_file = 'test1.csv'
cols_to_remove =[0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0
with open(input_file, "r") as source: #to run and delete column
reader = csv.reader(source)
with open(output_file, "w") as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count)) # Print rows processed
for col_index in cols_to_remove:
del row[col_index]
writer.writerow(row)
print(row)
I have tried codes from other similar questions asked, however it prints into an empty file.
Upvotes: 0
Views: 1241
Reputation: 2129
Checking length of row
as someone suggested might not work as empty rows in csv may contain empty Strings ''
. So len(row)
wouldn't return 0 because it is a list of empty strings.
To simply delete empty rows in csv try adding the following check
skip_row = True
for item in row:
if item != None and item != '':
skip_row = False
if not skip_row:
# process row
Full code would be
import csv
input_file = 'test.csv'
output_file = 'test1.csv'
cols_to_remove = [0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0
with open(input_file, "r") as source: #to run and delete column
reader = csv.reader(source)
with open(output_file, "w", newline='' ) as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count)) # Print rows processed
skip_row = True
for item in row:
if item != None and item != '':
skip_row = False
if not skip_row:
for col_index in cols_to_remove:
del row[col_index]
writer.writerow(row)
print(row)
Here we check if each element of a row is None
type or an empty string and decide if we should skip that row or not.
Upvotes: 1
Reputation: 3543
try to skip row if any cell empty like this:
if any(cel is None or cel == '' for cel in row):
continue
Here's the code:
import csv
input_file = 'hey.txt'
output_file = 'test1.csv'
cols_to_remove = [0, 1, 9, 11, 14, 15, 23, 28, 29, 32, 33, 37, 38]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0
with open(input_file, "r+") as source: # to run and delete column
reader = csv.reader(source)
with open(output_file, "w+") as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count)) # Print rows processed
if any(cel is None or cel == '' for cel in row):
continue
for col_index in cols_to_remove:
try:
del row[col_index]
except Exception:
pass
writer.writerow(row)
print(row)
files and cols_to_remove were changed, please, use your own. step of skipping might be moving after row deleting.
Upvotes: 1
Reputation: 1016
Assuming that the empty row is in fact one that looks like this
,,,,,,,,,,,,,, (and many more)
you can do the following:
import csv
input_file = 'test.csv'
output_file = 'test1.csv'
cols_to_remove =[0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0
with open(input_file, "r") as source: #to run and delete column
reader = csv.reader(source)
with open(output_file, "w") as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count)) # Print rows processed
all_empty = False # new
for cell in row: # new
if len(cell) == 0: # new
all_empty = True # new
break# new
if all_empty: # new
continue # new
for col_index in cols_to_remove:
del row[col_index]
writer.writerow(row)
print(row)
Upvotes: 1
Reputation: 70
you can use this for delete all the row that have a null value,
data = data.dropna()
and this one for delete column,
data = data.drop(['column'], axis=1)
Upvotes: 0