Ojayer
Ojayer

Reputation: 39

python, trying to sort out lines from a textfile

I'm trying to sort out "good numbers" from "bad" ones. My problem is that some of the numbers I'm getting from the textfile contain spaces (" "). These functions identify them by splitting on spaces so that all lines that contain spaces show up as bad numbers regardless of whether they are good or bad.

Anyone got any idea how to sort them out? I'm using this right now.

def showGoodNumbers():
    print ("all good numbers:")
    textfile = open("textfile.txt", "r")
    for line in textfile.readlines():
        split_line = line.split(' ')
        if len(split_line) == 1:
            print(split_line) # this will print as a tuple
    textfile.close

def showBadNumbers():
    print ("all bad numbers:")
    textfile = open("textfile.txt", "r")
    for line in textfile.readlines():
        split_line = line.split(' ')
        if len(split_line) > 1:
            print(split_line) # this will print as a tuple
    textfile.close

The text file looks like this (all entries with a comment are "bad"):

Upvotes: 0

Views: 119

Answers (2)

thiruvenkadam
thiruvenkadam

Reputation: 4260

String manipulation is all you needed here.

allowed_chars = ['-', '.', ' ', '\n']
with open("textfile.txt", "r") as fp:
    for line in fp:
        line_check = line
        for chars in allowed_chars:
            line_check = line_check.replace(chars, '')
        if line_check.isdigit():
            print "Good line:", line
        else:
            print "Bad line:", line

you can add any number of characters to allowed_chars list. Just for your ease of adding characters. I added \n in the allowed_chars list so that the trailing newline character will also be handled, based on the comments.

Upvotes: 1

James Mills
James Mills

Reputation: 19050

This is (yet another) classic example of where the Python re module really shines:

from re import match


with open("textfile.txt", "r") as f:
    for line in f:
        if match("^[0-9- ]*$", line):
            print "Good Line:", line
        else:
            print "Bad Line:", line

Output:

Good Line: 13513 51235

Good Line: 235235-23523

Bad Line: 2352352-23 - not valid

Bad Line: 235235 - too short

Good Line: 324-134 3141

Bad Line: 23452566246 - too long

Upvotes: 5

Related Questions