Reputation: 2207
This is the text file sb.txt
JOHN:ENGINEER:35?:
Now this the piece of code that tries to perform a regex search on the above line.
biodata1 = re.search(r'([\w\W])+?:([\w\W])+?:([\w\W])+?:',line)
Now I get a proper output for biodata1.group(1), biodata1.group(2) and biodata1.group(3).
If however, I modify the file by removing ":" from the end
JOHN:ENGINEER:35?
and run the script again, I get the following error which makes sense since group(3) didn't match successfully
Traceback (most recent call last):
File "dictionary.py", line 26, in <module>
print('re.search(r([\w\W])+?:([\w\W])+?:([\w\W])+? '+biodata1.group(1)+' '+biodata1.group(2)+' '+biodata1.group(3)) # STMT1
AttributeError: 'NoneType' object has no attribute 'group'
But group(1) and group(2) should've still matched "N" "R" respectively. Is there anyway to avoid this error and attempt a best attempt approach to regex so it doesn't fail and at least prints biodata1.group(1) & biodata1.group(2).
I tried to edit the output statment by not having it print biodata1.group(3) though that didn't work
Upvotes: 1
Views: 147
Reputation: 28302
I think you misunderstand what has happened. Your entire regular expression has failed to match and therefore there is no match object.
Where it says AttributeError: 'NoneType' object has no attribute 'group'
it's trying to tell you that biodata1 is None. None is the return you get from re.search when it fails to match.
To be clear, there's no way to get a "best match". What you're asking for is that re should make a decision as to what you really want. If you want groups to be optional, you need to make them optional.
Depending on what you actually want you can try the regexes:
r'([\w\W])+?:([\w\W])+?:([\w\W])+?:?'
or
r'([\w\W])+?:([\w\W])+?:(([\w\W])+?:)?'
Which respectively make the last :
and the entire last group optional.
Upvotes: 3
Reputation: 3149
You'll have to modify the regex to instruct it on what exactly is optional and what isn't. Python regexes don't have this concept of partial matches. One possibility is to change it to
biodata1 = re.search(r'([\w\W])+?:(?:([\w\W])+?:(?:([\w\W])+?:)?)?',line)
Where you allow 1, 2 or 3 groups to match. In this case, any groups that don't match will return the empty string when you do match.group(X)
Upvotes: 2
Reputation: 6278
What a regex does is it matches exactly what you provided. There is no best try or anything like that.
If you want some part of your match to be optional you need to declare it using the ?
operator. So in your case your regex would need to look like this:
biodata1 = re.search(r'([\w\W])+?:([\w\W])+?:([\w\W])+?:?',line)
Also +?
(at least once, or not at all) is equal to *
(at least zero times), so you could just do this:
biodata1 = re.search(r'([\w\W])*:([\w\W])*:([\w\W])*:?',line)
Upvotes: 1