Joseph Sanders
Joseph Sanders

Reputation: 143

In python how can I remove entire rows where a specific column is blank?

I am trying to figure out how to make the code below exclude any instances where row[9] has a blank entry. Currently the csv file outputs for every Row, which results in rows with a blank entry for "Dad". I want the output csv file to not include any row where the "Dad" column (row[9]) is blank...

Any help is greatly appreciated!

def guardiansfather():
with open('hallpass_raw.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    with open('outputfiles/guardians_father.csv', mode='w', newline='') as output_file:
        write = csv.writer(output_file, delimiter=',', quoting=csv.QUOTE_MINIMAL)
        for row in csv_reader:
            a = row[0]
            b = row[1]
            c = row[2]
            studentnumber = row[3]
            firstname = row[4]
            middlename = row[5]
            lastname = row[6]
            teacher = row[7]
            idnumber = row[8]
            father = row[9]
            mother = row[10]
            guardianemail = row[11]
            phone = row[12]
            fatheremail = row[13]
            motheremail = row[14]
            guardianphone = row[15]
            schoolname = 'NAME OF SCHOOL'
            relationship = 'Father'
            father_first = father.split(sep=','[0])
            father_last = father.split(sep=', '[1])


            write.writerow([schoolname, studentnumber, father_first, father_last, relationship, phone, fatheremail])

Upvotes: 0

Views: 890

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34046

You can use pandas library for this:

1.) Read your csv into a pandas dataframe:

import pandas as pd
df = pd.read_csv('hallpass_raw.csv', sep=',')

Example: Let's say this is how your dataframe looks like.

In [365]: df
Out[365]: 
  fname   age  salary
0     a   5.0     1.5
1     a   5.0     1.5
2     b           1.0
3     b  15.0     
4     c  20.0     1.0

2.) Drop rows with specific column as NULL:

You want to drop rows which have NULL value in column age:

In [364]: df = df[df.age <> '']
Out[364]: 
  fname   age  salary
0     a   5.0     1.5
1     a   5.0     1.5
3     b  15.0     
4     c  20.0     1.0

You can see that 2nd row is dropped.

3.) Write the processed dataframe back to csv:

df.to_csv('new.csv', index=False)

With this, you don't have to worry about complex loops for processing your csv.

Upvotes: 1

Barmar
Barmar

Reputation: 780724

Use an if statement

def guardiansfather():
    with open('hallpass_raw.csv') as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        with open('outputfiles/guardians_father.csv', mode='w', newline='') as output_file:
            write = csv.writer(output_file, delimiter=',', quoting=csv.QUOTE_MINIMAL)
            for row in csv_reader:
                a = row[0]
                b = row[1]
                c = row[2]
                studentnumber = row[3]
                firstname = row[4]
                middlename = row[5]
                lastname = row[6]
                teacher = row[7]
                idnumber = row[8]
                father = row[9]

                # Skip rows with empty father
                if father.strip() == '':
                    continue

                mother = row[10]
                guardianemail = row[11]
                phone = row[12]
                fatheremail = row[13]
                motheremail = row[14]
                guardianphone = row[15]
                schoolname = 'NAME OF SCHOOL'
                relationship = 'Father'
                father_first = father.split(sep=','[0])
                father_last = father.split(sep=', '[1])

                write.writerow([schoolname, studentnumber, father_first, father_last, relationship, phone, fatheremail])

Upvotes: 2

Related Questions