Jason Jabbour
Jason Jabbour

Reputation: 47

Python text processing/finding data

I am trying to parse/process some information from a text file using Python. This file contains names, employee numbers and other data. I do not know the names or employee numbers ahead of time. I do know that after the names there is the text: "Per End" and before the employee number there is the text: "File:". I can find these items using the .find() method. But, how do I ask Python to look at the information that comes before or after "Per End" and "File:"? In this specific case the output should be the name and employee number.

The text looks like this:

SMITH, John
Per End: 12/10/2016
File:
002013
Dept:
000400
Rate:10384 60

My code is thus:

file = open("Register.txt", "rt")
lines = file.readlines()
file.close()

countPer = 0
for line in lines:
    line = line.strip()
    print (line)
    if line.find('Per End') != -1:
        countPer += 1
print ("Per End #'s: ", countPer)

Upvotes: 2

Views: 107

Answers (2)

be_good_do_good
be_good_do_good

Reputation: 4441

file = open("Register.txt", "rt")
lines = file.readlines()
file.close()

for indx, line in enumerate(lines):
    line = line.strip()
    print (line)
    if line.find('Per End') != -1:
        print lines[indx-1].strip()
    if line.find('File:') != -1:
        print lines[indx+1].strip()

enumerate(lines) gives access to indices and line as well, there by you can access previous and next lines as well

here is my stdout directly ran in python shell:

>>> file = open("r.txt", "rt")
>>> lines  = file.readlines()
>>> file.close()
>>> lines
['SMITH, John\n', 'Per End: 12/10/2016\n', 'File:\n', '002013\n', 'Dept:\n', '000400\n', 'Rate:10384 60\n']

>>> for indx, line in enumerate(lines):
...     line = line.strip()
...     if line.find('Per End') != -1:
...        print lines[indx-1].strip()
...     if line.find('File:') != -1:
...        print lines[indx+1].strip()

SMITH, John
002013

Upvotes: 1

Shawn Mehan
Shawn Mehan

Reputation: 4568

Here is how I would do it.

First, some test data.

test = """SMITH, John\n
Per End: 12/10/2016\n
File:\n
002013\n
Dept:\n
000400\n
Rate:10384 60\n"""

text = [line for line in test.splitlines(keepends=False) if line != ""]

Now for the real answer.

count_per, count_num = 0, 0

Using enumerate on an iterable gives you an index automagically.

for idx, line in enumerate(text):

    # Just test whether what you're looking for is in the `str`

    if 'Per End' in line:
        print(text[idx - 1]) # access the full set of lines with idx
        count_per += 1
    if 'File:' in line:
        print(text[idx + 1])
        count_num += 1

print("Per Ends = {}".format(count_per))
print("Files = {}".format(count_num))

yields for me:

SMITH, John
002013
Per Ends = 1
Files = 1

Upvotes: 0

Related Questions