Daniel
Daniel

Reputation: 73

Create a list in Python containing data from specified columns in a csvfile

I have a csvfile that I read in like so:

with io.open(script, 'r', encoding='utf-8-sig') as csvfile:
    entireFile = csv.reader(csvfile)

I want to loop through 'entireFile' and copy specific columns to a new list. I have another list, lets call it 'wantedColumnsList', that contains the names of the column headers that I want to copy over like so:

wantedColumnsList = ["Column2", "Column5", "Column9"]

So to reiterate, I want to loop through 'entireFile' and copy all the data into a new list, but only the data contained in the columns outlined in wantedColumnsList. The reason I want to to do this (just to provide some context) is so that I can loop through the new list and search for any null values. If I find a null value, I want to be able to print out the column name that contains the null. Essentially I'm creating a script that reads in a file and checks if certain columns contain null values when they shouldn't. Thank you!

Upvotes: 2

Views: 81

Answers (3)

zipa
zipa

Reputation: 27869

If you are open to pandas you can use this code to filter out desired columns:

import pandas as pd

df = pd.read_csv('your_file.csv', encoding='utf-8-sig')
wantedColumnsList = ["Column2", "Column5", "Column9"]

df = df[wantedColumnsList]

And if you want to dump it to csv afterwards just use:

df.to_csv('your_new_file.csv', encoding='utf-8-sig')

EDIT: And to find columns with nan:

df.columns[df.isnull().any()].tolist()

To find specific rows with nan:

df[df.isnull().any(axis=1)]

Upvotes: 1

mhawke
mhawke

Reputation: 87064

You don't need to create a second list to check for nulls. Just iterate over a csv.DictReader, check the wanted fields for null values, and print the field names if null(s) are found.

import csv

wanted_fields = ("Column2", "Column5", "Column9")

with open('data.csv') as f:
    for line_no, row in enumerate(csv.DictReader(f), 2):
        null_fields = [field for field in wanted_fields if row[field] == '']
        if null_fields:
            print('Line {} contains null in column(s) {}'.format(line_no, ', '.join(null_fields)))

Demo

For sample data:

Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9,Column10
1,2,3,4,5,6,7,8,9,10
1,,3,4,5,6,7,8,9,10
1,,3,4,,6,7,8,9,10
1,,3,4,,6,7,8,,10
1,2,3,4,,6,7,8,9,10
,,,,,,,,,

Output

Line 3 contains null in column(s) Column2
Line 4 contains null in column(s) Column2, Column5
Line 5 contains null in column(s) Column2, Column5, Column9
Line 6 contains null in column(s) Column5
Line 7 contains null in column(s) Column2, Column5, Column9

Upvotes: 0

Kotauskas
Kotauskas

Reputation: 1374

Use str.split(splitting_symbol) to split your csv to a 2D list and iterate it.

Upvotes: 0

Related Questions