Reputation: 872
Iam using python regex to extract certain values from a given string. This is my string:
mystring.txt
sometext
somemore text here
some other text
course: course1
Id Name marks
____________________________________________________
1 student1 65
2 student2 75
3 MyName 69
4 student4 43
course: course2
Id Name marks
____________________________________________________
1 student1 84
2 student2 73
8 student7 99
4 student4 32
course: course4
Id Name marks
____________________________________________________
1 student1 97
3 MyName 60
8 student6 82
and I need to extract the course name and corresponding marks for a particular student. For example, I need the course and marks for MyName
from the above string.
I tried:
re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL)
But this works only if MyName is present under each course, but not if MyName is missing in some of the course, like in my example string.
Here I get output as: [('course1', '69'), ('course2', '60')]
but what actually what I want to achive is: [('course1', '69'), ('course4', '60')]
what would be the correct regex for this?
#!/usr/bin/python
import re
buffer_fp = open("mystring.txt","r+")
buff = buffer_fp.read()
buffer_fp.close()
print re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL)
Upvotes: 5
Views: 443
Reputation: 284
I suspect this is impossible to do in a single regular expression. They are not all-powerful.
Even if you find a way, don't do this. Your non-working regex is already close to unreadable; a working solution is likely to be even more so. You can most likely do this in just a few lines of meaningful code. Pseudocode solution:
for line in buff:
if it is a course line:
set the course variable
if it is a MyName line:
add (course, marks) to the list of matches
Note that this could (and probably should) involve regexes in each of those if blocks. It's not a case of choosing between the hammer and the screwdriver to the exclusion of the other, but rather using them both for what they do best.
Upvotes: 2
Reputation: 67968
.*?course: (\w+)(?:(?!\bcourse\b).)*MyName\s+(\d+).*?
^^^^^^^^^^^^
You can try this.See demo.Just use a lookahead based quantifier which will search for MyName
before a course
just before it.
https://regex101.com/r/pG1kU1/26
Upvotes: 5