Extract Numeric Data from a Text file in Python

Question

Say I have a text file with the data/string:

Dataset #1: X/Y= 5, Z=7 has been calculated
Dataset #2: X/Y= 6, Z=8 has been calculated
Dataset #10: X/Y =7, Z=9 has been calculated

I want the output to be on a csv file as:

X/Y, X/Y, X/Y

Which should display:

5, 6, 7

Here is my current approach, I am using string.find, but I feel like this is rather difficult in solving this problem:

data = open('TestData.txt').read()
#index of string
counter = 1

if (data.find('X/Y=')==1):      
#extracts segment out of string
    line = data[r+6:r+14]
    r = data.find('X/Y=')
    counter += 1 
    print line
else: 
    r = data.find('X/Y')`enter code here`
    line = data[r+6:r+14]
    for x in range(0,counter):
    print line


print counter

Error: For some reason, I'm only getting the value of 5. when I setup a #loop, i get infinite 5's.

Padraic Cunningham · Accepted Answer

If you want the numbers and your txt file is formatted like the first two lines i.e X/Y= 6, not like X/Y =7:

import re
result=[]
with open("TestData.txt") as f:
    for line in f:
        s = re.search(r'(?<=Y=\s)\d+',line) # pattern matches up to "Y" followed by "=" and a space "\s" then a digit or digits. 
        if s: # if there is a match i.e re.search does not return None, add match to the list.
            result.append(s.group())
print result
['5', '6', '7']

To match the pattern in your comment, you should escape the period like . or you will match strings like 1.2+3 etc.. the "." has special meaning re.

So re.search(r'(?<=Counting Numbers =\s)\d\.\d\.\d',s).group() will return only 1.2.3

If it makes it more explicit, you can use s=re.search(r'(?<=X/Y=\s)\d+',line) using the full X/Y=\s pattern.

Using the original line in your comment and updated line would return :

['5', '6', '7', '5', '5']

The (?<=Y=\s)is called a positive lookbehind assertion.

(?<=...)

Matches if the current position in the string is preceded by a match for ... that ends at the current position

There are lots of nice examples here in the re documentation. The items in the parens are not returned.

Extract Numeric Data from a Text file in Python

Answers (2)

Related Questions