Branzol
Branzol

Reputation: 373

Python Reading File and Skipping Invalid Lines

I've been trying to write some code to read a CSV file. Some of the lines in the CSV are not complete. I would like the code to skip a bad line if there is data missing in one of the fields. I'm using the following code.

def Test():

    dataFile = open('test.txt','r')
    readFile = dataFile.read()
    lineSplit = readFile.split('\n')

    for everyLine in lineSplit:
        dividedLine = everyLine.split(';')
        a = dividedLine[0]
        b = dividedLine[1]
        c = dividedLine[2]
        d = dividedLine[3]
        e = dividedLine[4]
        f = dividedLine[5]
        g = dividedLine[6]

        print (a,b,c,d,e,f,g)

Upvotes: 1

Views: 2278

Answers (3)

martineau
martineau

Reputation: 123473

In my opinion, the Pythonic way to do this would be to use the included csv module in conjunction with a try/except block (while following PEP 8 - Style Guide for Python Code).

import csv

def test():
    with open('reading_test.txt','rb') as data_file:
        for line in csv.reader(data_file):
            try:
                a,b,c,d,e,f,g = line
            except ValueError:
                continue  # ignore the line
            print(a,b,c,d,e,f,g)

test()

This approach is called "It's Easier to Ask Forgiveness than Permission" (EAFP). The other more common style is referred to as "Look Before You Leap" (LBYL). You can read more about them in this snippet from a book by a very authoritative author.

Upvotes: 2

Javeed
Javeed

Reputation: 221

This doesn't seem all-that python related so much as conceptual: A line parsed from a csv row will be invalid if: 1. It is shorter than the minimum required length (i.e missing elements) 2. One or more entries parsed come back empty or None (only if all elements are required) 3. The type of an element doesn't match the intended type of the column (not in the scope of what you requested, but good to keep in mind)

In python, once you have split the array, you can check the first two conditions with

if len(dividedLines) < intended_length or ("" in dividedLines): continue

First part just needs you to get the intended length for a row, you can usually use the index row for that. The second part could have the quotes replaced with a None or something, but split returns a empty string so in this case use the "".

HTH

Upvotes: 0

Andr&#233; Fratelli
Andr&#233; Fratelli

Reputation: 6068

Given that you cannot know before hand whether a given line is incomplete, you need to check if it is and skip it if it is not. You can use continue for this, which makes the for loop move to the next iteration:

def Test():

    dataFile = open('test.txt','r')
    readFile = dataFile.read()
    lineSplit = readFile.split('\n')

    for everyLine in lineSplit:
        dividedLine = everyLine.split(';')

        if len(dividedLine) != 7:
            continue

        a = dividedLine[0]
        b = dividedLine[1]
        c = dividedLine[2]
        d = dividedLine[3]
        e = dividedLine[4]
        f = dividedLine[5]
        g = dividedLine[6]

        print (a,b,c,d,e,f,g)

Upvotes: 0

Related Questions