Reputation: 31
I'm new to python and having the following issue.
I have a text file (filename.dat) which provides information about my model. A summary of the relevant portions are as follows:
NUMBER OF ELEMENTS IS 1367466
NUMBER OF NODES IS 252624
NUMBER OF NODES DEFINED BY THE USER 248291
NUMBER OF INTERNAL NODES GENERATED BY THE PROGRAM 4333
TOTAL NUMBER OF VARIABLES IN THE MODEL 783873
I can search for the line using the following python commands:
with open('filename.dat', 'r') as inF:
for line in inF:
if 'NUMBER OF ELEMENTS IS' in line:
print "true"
However I'm not sure how to extract the integer value (1367466) on the same line as 'NUMBER OF ELEMENTS IS'. Does anyone now how to extract the string numbers from a line that is mixed with string characters?
Upvotes: 3
Views: 2110
Reputation: 81
I would choose regular expression as well
import re
with open('filename', 'r') as inF:
for line in inF:
match = re.match(r"([a-z]+)([0-9]+)", line)
if match:
items = match.groups()
That would give you a list with the string and the numbers
Upvotes: 0
Reputation: 97938
One way is to use split:
with open('filename.dat', 'r') as inF:
for line in inF:
if 'NUMBER OF ELEMENTS IS' in line:
print [int(d) for d in line.split() if d.isdigit()]
str.isdigit() returns true if all characters in the string are digits and there is at least one character, otherwise, it returns false. line.split
splits the line into words, so for your example you will get ['NUMBER', 'OF', 'ELEMENTS', 'IS', '1367466']
. the isdigit()
then, works as a filter to to select the part consisting of all digits. This might be handy if you are not sure where the digits are. Otherwise you can just grab the word of interest.
Another way is using regular expressions, but this is an overkill for your simple example:
import re
with open('input', 'r') as inF:
for line in inF:
m = re.match('NUMBER OF ELEMENTS IS\s*(\d+)', line)
if m:
print m.group(1)
Upvotes: 0
Reputation: 713
You can use regular expressions.
text = open('filename.dat', 'r').read()
matches = re.search("NUMBER OF ELEMENTS IS\s+(\d+)", text)
if matches is not None:
num_of_elem = matches[0].group(1)
The parentheses in the regular expression denote a sub-match of the matched expression, allowing you to access this part of the match later on using the group
function (as exampled in the last line).
Upvotes: 0
Reputation: 62888
Split the line by whitespace from the right, once:
In [18]: line.rsplit(None, 1)
Out[18]: ['TOTAL NUMBER OF VARIABLES IN THE MODEL', '783873']
Take the second part:
In [19]: line.rsplit(None, 1)[1]
Out[19]: '783873'
Convert it to int:
In [20]: int(line.rsplit(None, 1)[1])
Out[20]: 783873
You can use tuple unpacking to make the code cleaner (if your entire file is of this format):
with open('filename.dat', 'r') as inF:
for line in inF:
label, number = line.rsplit(None, 1)
if 'NUMBER OF ELEMENTS IS' in label:
print "true"
number = int(number)
...
If some lines are of a different format, you'll have to search first and split later:
with open('filename.dat', 'r') as inF:
for line in inF:
if 'NUMBER OF ELEMENTS IS' in line:
print "true"
label, number = line.rsplit(None, 1) # label is unused then
number = int(number)
...
Upvotes: 4
Reputation: 1367
One way of doing it is using str.split()
and getting the last element:
In [21]: line = 'NUMBER OF ELEMENTS IS 1367466'
In [22]: line.split()[-1]
Out[22]: '1367466'
Convert that to int and you have a number. However, this won't work if your number isn't the last thing on the line. Caveat emptor.
Upvotes: 0