Reputation: 73
I have a csvfile that I read in like so:
with io.open(script, 'r', encoding='utf-8-sig') as csvfile:
entireFile = csv.reader(csvfile)
I want to loop through 'entireFile' and copy specific columns to a new list. I have another list, lets call it 'wantedColumnsList', that contains the names of the column headers that I want to copy over like so:
wantedColumnsList = ["Column2", "Column5", "Column9"]
So to reiterate, I want to loop through 'entireFile' and copy all the data into a new list, but only the data contained in the columns outlined in wantedColumnsList. The reason I want to to do this (just to provide some context) is so that I can loop through the new list and search for any null values. If I find a null value, I want to be able to print out the column name that contains the null. Essentially I'm creating a script that reads in a file and checks if certain columns contain null values when they shouldn't. Thank you!
Upvotes: 2
Views: 81
Reputation: 27869
If you are open to pandas
you can use this code to filter out desired columns:
import pandas as pd
df = pd.read_csv('your_file.csv', encoding='utf-8-sig')
wantedColumnsList = ["Column2", "Column5", "Column9"]
df = df[wantedColumnsList]
And if you want to dump it to csv
afterwards just use:
df.to_csv('your_new_file.csv', encoding='utf-8-sig')
EDIT:
And to find columns with nan
:
df.columns[df.isnull().any()].tolist()
To find specific rows with nan
:
df[df.isnull().any(axis=1)]
Upvotes: 1
Reputation: 87064
You don't need to create a second list to check for nulls. Just iterate over a csv.DictReader
, check the wanted fields for null values, and print the field names if null(s) are found.
import csv
wanted_fields = ("Column2", "Column5", "Column9")
with open('data.csv') as f:
for line_no, row in enumerate(csv.DictReader(f), 2):
null_fields = [field for field in wanted_fields if row[field] == '']
if null_fields:
print('Line {} contains null in column(s) {}'.format(line_no, ', '.join(null_fields)))
Demo
For sample data:
Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9,Column10 1,2,3,4,5,6,7,8,9,10 1,,3,4,5,6,7,8,9,10 1,,3,4,,6,7,8,9,10 1,,3,4,,6,7,8,,10 1,2,3,4,,6,7,8,9,10 ,,,,,,,,,
Output
Line 3 contains null in column(s) Column2 Line 4 contains null in column(s) Column2, Column5 Line 5 contains null in column(s) Column2, Column5, Column9 Line 6 contains null in column(s) Column5 Line 7 contains null in column(s) Column2, Column5, Column9
Upvotes: 0
Reputation: 1374
Use str.split(splitting_symbol)
to split your csv to a 2D list and iterate it.
Upvotes: 0