Nienke Luirink
Nienke Luirink

Reputation: 141

How to ignore empty values in csv file and continue in Python

I have two example csv files, csvexample.csv looks like this:

ID Text  
1  'good morning'  
2  'good afternoon'  
3  'good evening'  

While csvexample1.csv looks like this:

Day Month  
14  'Feb'  
21  'Mar'  
31  'May' 

With the following code, I get the result that I want, which is to add the first column of csvexample.csv and the second column of csvexample1.csv to one list; res.

import csv

res = []
with open('csvexample.csv') as f, open('csvexample1.csv') as a:
    reader=csv.reader(f) 
    reader1=csv.reader(a)
    next(reader)
    next(reader1)
    for row in zip(reader, reader1):
        res.extend([row[0][0], row[1][1]])  

print(res)   

I get the following outcome:

['1', 'Feb', '2', 'Mar', '3', 'May']  

However, the actual csv files I want to apply this code to contain some empty cells, seeing as I am adding the Twitter bio from companies from one file and the Tweets of those companies from another file into one list, but some companies do not have a bio on Twitter so those cells in a specific column are empty. Furthermore, in most cases the first file has much less rows than the second file, but the outcome then seems to stop when the first file has no rows left and ignores all the other rows in the second file. For example, if I edit csvexample.csv like this:

ID Text   
1  'good morning'  
2  'good afternoon'   

3  'good evening'  
4  

and csvexmple1.csv like this:

Day Month  
14  'Feb'  
21     
31  'May'  

I get the following outcome:

['1', 'feb', '2', '', '', 'may']  

instead of the desired outcome:

['1', 'feb', '2', '', '', 'may', '4']

I tried many different things but I really can't edit it to the required outcome.

from itertools import zip_longest
from io import StringIO
import csv

mystr1 = StringIO("""ID Text
1 'good morning'
2 'good afternoon'

3 'good evening'
4
""")

mystr2 = StringIO("""Day Month
14 'Feb'
21
31 'May'
""")

res = []
with mystr1 as f, mystr2 as a:


    reader = csv.reader(f, delimiter=' ')
    reader1 = csv.reader(a, delimiter=' ')

    next(reader)
    next(reader1)

for row in zip_longest(reader, reader1, fillvalue=''):
    var1 = row[0][0] if len(row[0]) else ''
    var2 = row[1][1] if len(row[1]) else ''
    res.extend([var1, var2])

print(res)

This example gives me the following error: Traceback (most recent call last): File "thesis.py", line 31, in <module> var2 = row[1][1] if len(row[1]) else '' IndexError: list index out of range

Upvotes: 1

Views: 2359

Answers (1)

jpp
jpp

Reputation: 164693

You can use itertools.filterfalse to remove blank rows. These rows will start with \n and can be identified accordingly.

from itertools import zip_longest
from io import StringIO
import csv

mystr1 = StringIO("""ID Text
1 'good morning'
2 'good afternoon'

3 'good evening'
4
""")

mystr2 = StringIO("""Day Month
14 'Feb'
21
31 'May'
""")

res = []

with mystr1 as f, mystr2 as a:


    reader = csv.reader(f, delimiter=' ')
    reader1 = csv.reader(a, delimiter=' ')

    next(reader)
    next(reader1)

    for row in zip_longest(reader, reader1, fillvalue=''):
        try:
            var1 = row[0][0]
        except IndexError:
            var1 = ''
        try:
            var2 = row[1][1]
        except IndexError:
            var2 = ''
        res.extend([var1, var2])

print(res)

['1', "'Feb'", '2', '', '', "'May'", '3', '', '4', '']

Upvotes: 4

Related Questions