Dhiwakar Ravikumar
Dhiwakar Ravikumar

Reputation: 2207

Python Regular Expression with Greedy Groups

This is the content of the input file.

 sb.txt
 JOHN:ENGINEER:35

These are the patterns that are used to evaluate the file.

finp = open(r'C:\Users\dhiwakarr\PycharmProjects\BasicConcepts\sb.txt','r')

for line in finp:
   biodata1 = re.search(r'([\w\W])+?:([\w\W])+?:([\w\W])+?',line)
   biodata2 = re.search(r'([\w\W]+?):([\w\W]+?):([\w\W]+?)',line)
   print('line is '+line)
   print('re.search(r([\w\W])+?:([\w\W])+?:([\w\W])+? '+biodata1.group(1)+' '+biodata1.group(2)+' '+biodata1.group(3))
   print('re.search(r([\w\W]+?):([\w\W]+?):([\w\W]+?) '+biodata2.group(1)+' '+biodata2.group(2)+' '+biodata2.group(3))

This is the output I got

line is JOHN:ENGINEER:35
re.search(r([\w\W])+?:([\w\W])+?:([\w\W])+? N R 3
re.search(r([\w\W]+?):([\w\W]+?):([\w\W]+?) JOHN ENGINEER 3

I have a couple of questions about the output it produces.

  1. Why did the first search pattern match the last characters of JOHN, ENGINEER but matched the first character of 35 ? I was expecting the greedy character "?" to exit as soon as the first character of JOHN and ENGINEER were found.

  2. Can someone help me understand how the placement of "+?" affect the output in
    either statement ?

Upvotes: 0

Views: 957

Answers (1)

The6thSense
The6thSense

Reputation: 8335

Difference between biodata1 and biodata2 is the place of the parenthesis

biodata1 :

([\w\W])+?:([\w\W])+?:([\w\W])+?

Explanation:

The parenthesis matches one rgument before : for group(1)
like wise for group(2)
But there is no ending criteria for group(3) so it matched the first letter 3 after :

biodata2 :

([\w\W]+?):([\w\W]+?):([\w\W]+?)

Explanation:

You are matching all the words and non-words before : whicj should atleast have 1 words for group(1)
like wise for group(2)
but for group(3) you are matching all the word and non-word after second:

+?:

This checks if there is at least one or more character matching the given regex if so match it

Upvotes: 2

Related Questions