gurkha_dawg
gurkha_dawg

Reputation: 49

How can I use regular expression when reading a text file in Python?

I would like to give you an example. If I am trying to print lines that contain the integer -9999 from a file.

19940325       78     -28   -9999
19940326       50      17     102
19940327      100     -11   -9999
19940328       56     -33       0
19940329       61     -39   -9999
19940330       61     -56       0
19940331      139     -61   -9999
19940401      211       6       0

here is my code that uses regex to read the text file and scans to find the integer -9999 and print only the line/lines that contains that integer.

import re

file= open("USC00110072.txt", "r")


for line in file.readlines():
    if re.search('^-9999$', line, re.I):
        print line

My code runs with error but doesn't show anything in the output. Please let me know what mistake i have made.

Upvotes: 0

Views: 120

Answers (3)

dawg
dawg

Reputation: 103844

You can use filter:

with open(fn) as f:
    print filter(lambda line: '-9999' in line.split()[-1], f)

This is will check if '-9999' is in the final column of the line.

If you want to use a regex:

with open(fn) as f:
    for line in f:
        if re.search(r'-9999$', line): # remove $ if the -9999 can be anywhere in the line
            print line.strip()

The ^ you have will never match except for a line that only contains -9999 and nothing else. The ^ indicates the start of the line.

Or, just use in to test the presence of the string:

with open(fn) as f:
    for line in f:
        if '-9999' in line:
            print line.strip()

Upvotes: 1

Wayne Werner
Wayne Werner

Reputation: 51807

Alternatively, since you have a csv file you could use the csv module:

import csv
import io

file = io.StringIO(u'''
19940325\t78\t-28\t-9999
19940326\t50\t17\t102
19940327\t100\t-11\t-9999
19940328\t56\t-33\t0
19940329\t61\t-39\t-9999
19940330\t61\t-56\t0
19940331\t139\t-61\t-9999
19940401\t211\t6\t0
'''.strip())

reader = csv.reader(file, delimiter='\t')
for row in reader:
    if row[-1] == '-9999':   # or, for regex, `re.match(r'^-9999$', row[-1])`
        print('\t'.join(row))

Upvotes: 1

Cory Kramer
Cory Kramer

Reputation: 117876

Regex is likely overkill for this, a simple substring check using the in operator seems sufficient

with open("USC00110072.txt") as f:
    for line in f:
        if '-9999' in line:
            print(line)

Or if you're concerned about that matching that as a "whole word" you can do a little more to divide up the values

with open("USC00110072.txt") as f:
    for line in f:
        if '-9999' in line.strip().split('\t'):
            print(line)

Upvotes: 3

Related Questions