Reputation: 63
Say I have a text file with the data/string:
Dataset #1: X/Y= 5, Z=7 has been calculated
Dataset #2: X/Y= 6, Z=8 has been calculated
Dataset #10: X/Y =7, Z=9 has been calculated
I want the output to be on a csv file as:
X/Y, X/Y, X/Y
Which should display:
5, 6, 7
Here is my current approach, I am using string.find, but I feel like this is rather difficult in solving this problem:
data = open('TestData.txt').read()
#index of string
counter = 1
if (data.find('X/Y=')==1):
#extracts segment out of string
line = data[r+6:r+14]
r = data.find('X/Y=')
counter += 1
print line
else:
r = data.find('X/Y')`enter code here`
line = data[r+6:r+14]
for x in range(0,counter):
print line
print counter
Error: For some reason, I'm only getting the value of 5. when I setup a #loop, i get infinite 5's.
Upvotes: 2
Views: 4324
Reputation: 180411
If you want the numbers and your txt file is formatted like the first two lines i.e X/Y= 6
, not like X/Y =7
:
import re
result=[]
with open("TestData.txt") as f:
for line in f:
s = re.search(r'(?<=Y=\s)\d+',line) # pattern matches up to "Y" followed by "=" and a space "\s" then a digit or digits.
if s: # if there is a match i.e re.search does not return None, add match to the list.
result.append(s.group())
print result
['5', '6', '7']
To match the pattern in your comment, you should escape the period like . or you will match strings like 1.2+3 etc.. the "." has special meaning re.
So re.search(r'(?<=Counting Numbers =\s)\d\.\d\.\d',s).group()
will return only 1.2.3
If it makes it more explicit, you can use s=re.search(r'(?<=X/Y=\s)\d+',line)
using the full X/Y=\s
pattern.
Using the original line in your comment and updated line would return :
['5', '6', '7', '5', '5']
The (?<=Y=\s
)is called a positive lookbehind assertion.
(?<=...)
Matches if the current position in the string is preceded by a match for ... that ends at the current position
There are lots of nice examples here in the re documentation. The items in the parens are not returned.
Upvotes: 3
Reputation: 15306
Since it appears that the entities are all on a single line, I would recommend using readline
in a loop
to read the file line-by-line and then using a regex
to parse out the components you're looking for from that line.
Edit re: OP's comment:
One regex pattern that could be used to capture the number given the specified format in this case would be: X/Y\s*=\s*(.+),
Upvotes: 1