Deepa
Deepa

Reputation: 872

Python: regex findall

Iam using python regex to extract certain values from a given string. This is my string:

mystring.txt

sometext
somemore    text here

some  other text

              course: course1
Id              Name                marks
____________________________________________________
1               student1            65
2               student2            75
3               MyName              69
4               student4            43

              course: course2
Id              Name                marks
____________________________________________________
1               student1            84
2               student2            73
8               student7            99
4               student4            32

              course: course4
Id              Name                marks
____________________________________________________
1               student1            97
3               MyName              60
8               student6            82

and I need to extract the course name and corresponding marks for a particular student. For example, I need the course and marks for MyName from the above string.

I tried:

re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL)

But this works only if MyName is present under each course, but not if MyName is missing in some of the course, like in my example string.

Here I get output as: [('course1', '69'), ('course2', '60')]

but what actually what I want to achive is: [('course1', '69'), ('course4', '60')]

what would be the correct regex for this?

#!/usr/bin/python    
import re

buffer_fp = open("mystring.txt","r+")
buff = buffer_fp.read()
buffer_fp.close()
print re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL)

Upvotes: 5

Views: 443

Answers (2)

Béla
Béla

Reputation: 284

I suspect this is impossible to do in a single regular expression. They are not all-powerful.

Even if you find a way, don't do this. Your non-working regex is already close to unreadable; a working solution is likely to be even more so. You can most likely do this in just a few lines of meaningful code. Pseudocode solution:

for line in buff:
    if it is a course line:
        set the course variable
    if it is a MyName line:
        add (course, marks) to the list of matches

Note that this could (and probably should) involve regexes in each of those if blocks. It's not a case of choosing between the hammer and the screwdriver to the exclusion of the other, but rather using them both for what they do best.

Upvotes: 2

vks
vks

Reputation: 67968

.*?course: (\w+)(?:(?!\bcourse\b).)*MyName\s+(\d+).*?

                    ^^^^^^^^^^^^

You can try this.See demo.Just use a lookahead based quantifier which will search for MyName before a course just before it.

https://regex101.com/r/pG1kU1/26

Upvotes: 5

Related Questions