Reputation: 2207
This is the content of the input file.
sb.txt
JOHN:ENGINEER:35
These are the patterns that are used to evaluate the file.
finp = open(r'C:\Users\dhiwakarr\PycharmProjects\BasicConcepts\sb.txt','r')
for line in finp:
biodata1 = re.search(r'([\w\W])+?:([\w\W])+?:([\w\W])+?',line)
biodata2 = re.search(r'([\w\W]+?):([\w\W]+?):([\w\W]+?)',line)
print('line is '+line)
print('re.search(r([\w\W])+?:([\w\W])+?:([\w\W])+? '+biodata1.group(1)+' '+biodata1.group(2)+' '+biodata1.group(3))
print('re.search(r([\w\W]+?):([\w\W]+?):([\w\W]+?) '+biodata2.group(1)+' '+biodata2.group(2)+' '+biodata2.group(3))
This is the output I got
line is JOHN:ENGINEER:35
re.search(r([\w\W])+?:([\w\W])+?:([\w\W])+? N R 3
re.search(r([\w\W]+?):([\w\W]+?):([\w\W]+?) JOHN ENGINEER 3
I have a couple of questions about the output it produces.
Why did the first search pattern match the last characters of JOHN, ENGINEER but matched the first character of 35 ? I was expecting the greedy character "?" to exit as soon as the first character of JOHN and ENGINEER were found.
Can someone help me understand how the placement of "+?" affect the output in
either statement ?
Upvotes: 0
Views: 957
Reputation: 8335
Difference between biodata1 and biodata2 is the place of the parenthesis
biodata1 :
([\w\W])+?:([\w\W])+?:([\w\W])+?
Explanation:
The parenthesis matches one rgument before : for group(1)
like wise for group(2)
But there is no ending criteria for group(3) so it matched the first letter 3 after :
biodata2 :
([\w\W]+?):([\w\W]+?):([\w\W]+?)
Explanation:
You are matching all the words and non-words before : whicj should atleast have 1 words for group(1)
like wise for group(2)
but for group(3) you are matching all the word and non-word after second:
+?:
This checks if there is at least one or more character matching the given regex if so match it
Upvotes: 2