Paul M
Paul M

Reputation: 31

Using Python to finding a string in a file and extracting the integer value on the same line

I'm new to python and having the following issue.

I have a text file (filename.dat) which provides information about my model. A summary of the relevant portions are as follows:

      NUMBER OF ELEMENTS IS                               1367466
      NUMBER OF NODES IS                                   252624
      NUMBER OF NODES DEFINED BY THE USER                  248291
      NUMBER OF INTERNAL NODES GENERATED BY THE PROGRAM      4333
      TOTAL NUMBER OF VARIABLES IN THE MODEL               783873

I can search for the line using the following python commands:

with open('filename.dat', 'r') as inF:
    for line in inF:
        if 'NUMBER OF ELEMENTS IS' in line:
            print "true"

However I'm not sure how to extract the integer value (1367466) on the same line as 'NUMBER OF ELEMENTS IS'. Does anyone now how to extract the string numbers from a line that is mixed with string characters?

Upvotes: 3

Views: 2110

Answers (5)

Giota B
Giota B

Reputation: 81

I would choose regular expression as well

import re

with open('filename', 'r') as inF:
    for line in inF:  
        match = re.match(r"([a-z]+)([0-9]+)", line)
          if match:
            items = match.groups()

That would give you a list with the string and the numbers

Upvotes: 0

perreal
perreal

Reputation: 97938

One way is to use split:

with open('filename.dat', 'r') as inF:
    for line in inF:
        if 'NUMBER OF ELEMENTS IS' in line:
            print [int(d) for d in line.split() if d.isdigit()]

str.isdigit() returns true if all characters in the string are digits and there is at least one character, otherwise, it returns false. line.split splits the line into words, so for your example you will get ['NUMBER', 'OF', 'ELEMENTS', 'IS', '1367466']. the isdigit() then, works as a filter to to select the part consisting of all digits. This might be handy if you are not sure where the digits are. Otherwise you can just grab the word of interest.

Another way is using regular expressions, but this is an overkill for your simple example:

import re
with open('input', 'r') as inF:
    for line in inF:
            m = re.match('NUMBER OF ELEMENTS IS\s*(\d+)', line)
            if m:  
                    print m.group(1)

Upvotes: 0

yonili
yonili

Reputation: 713

You can use regular expressions.

text = open('filename.dat', 'r').read()

matches = re.search("NUMBER OF ELEMENTS IS\s+(\d+)", text)
if matches is not None:
    num_of_elem = matches[0].group(1)

The parentheses in the regular expression denote a sub-match of the matched expression, allowing you to access this part of the match later on using the group function (as exampled in the last line).

Upvotes: 0

Pavel Anossov
Pavel Anossov

Reputation: 62888

Split the line by whitespace from the right, once:

In [18]: line.rsplit(None, 1)
Out[18]: ['TOTAL NUMBER OF VARIABLES IN THE MODEL', '783873']

Take the second part:

In [19]: line.rsplit(None, 1)[1]
Out[19]: '783873'

Convert it to int:

In [20]: int(line.rsplit(None, 1)[1])
Out[20]: 783873

You can use tuple unpacking to make the code cleaner (if your entire file is of this format):

with open('filename.dat', 'r') as inF:
    for line in inF:
        label, number = line.rsplit(None, 1)
        if 'NUMBER OF ELEMENTS IS' in label:
            print "true"
            number = int(number)
            ...

If some lines are of a different format, you'll have to search first and split later:

with open('filename.dat', 'r') as inF:
    for line in inF:
        if 'NUMBER OF ELEMENTS IS' in line:
            print "true"
            label, number = line.rsplit(None, 1)   # label is unused then
            number = int(number)
            ...

Upvotes: 4

mariano
mariano

Reputation: 1367

One way of doing it is using str.split() and getting the last element:

In [21]: line = 'NUMBER OF ELEMENTS IS                               1367466'
In [22]: line.split()[-1]
Out[22]: '1367466'

Convert that to int and you have a number. However, this won't work if your number isn't the last thing on the line. Caveat emptor.

Upvotes: 0

Related Questions