Reputation: 13
I'm getting confused groups of Regex which from a book:《Automate the Boring Stuff with Python: Practical Programming for Total Beginners 》。The Regex as follow:
#! python3
# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard
# The data of paste from: https://www.nostarch.com/contactus.html
import pyperclip, re
phoneRegex = re.compile(r'''(
(\d{3}|\(\d{3}\))? # area code
(\s|-|\.)? # separator
(\d{3}) # first 3 digits
(\s|-|\.) # separator
(\d{4}) # last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,,5}))? # extension
)''', re.VERBOSE )
# TODO: Create email regex.
emailRegex = re.compile(r'''(
[a-zA-Z0-9._%+-]+ # username
@ # @ symbol
[a-zA-Z0-9.-]+ # domian name
(\.[a-zA-Z]{2,4}) # dot-something
)''', re.VERBOSE)
# TODO: Find matches in clipboard text.
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
**phoneNum = '-'.join ([groups[1], groups[3], groups[5]])
if groups[8]!= '':
phoneNum += ' x' + groups[8]**
matches.append(phoneNum)
print(groups[0])
for groups in emailRegex.findall(text):
matches.append(groups[0])
# TODO: Copy results to the clipboard.
if len(matches) > 0:
pyperclip.copy('\n'.join(matches))
print('Copied to clipboard:')
print('\n'.join(matches))
else:
print('No phone number or email addresses found.')
I am confused about groups1/groups[2]……/groups[8]. And how many groups in the phoneRegex. And what is the difference between groups() and groups[].
The data of paste from: [https://www.nostarch.com/contactus.html]
Upvotes: 0
Views: 482
Reputation: 271050
Regexes can have groups. They are denoted by ()
. Groups can be used to extract a part of the match which might be useful.
In the phone number regex for example, there are 9 groups:
Group Subpattern
1 ((\d{3}|\(\d{3}\))?(\s|-|\.) (\d{3}) (\s|-|\.)(\d{4})(\s*(ext|x|ext.)\s*(\d{2,,5}))?)
2 (\d{3}|\(\d{3}\))?
3 (\s|-|\.)
4 (\d{3})
5 (\s|-|\.)
6 (\d{4})
7 (\s*(ext|x|ext.)\s*(\d{2,,5}))?
8 (ext|x|ext.)
9 (\d{2,,5})
Note how each group is enclosed in ()
s.
The groups[x]
is just referring to the string matched by a particular group. groups[0]
means the string matched by group 1, groups[1]
means the string matched by group 2, etc.
Upvotes: 1
Reputation: 1933
In a regex, parenthesis ()
create what is called a capturing group. Each group is assigned a number, starting with 1.
For example:
In [1]: import re
In [2]: m = re.match('([0-9]+)([a-z]+)', '123xyz')
In [3]: m.group(1)
Out[3]: '123'
In [4]: m.group(2)
Out[4]: 'xyz'
Here, ([0-9]+)
is the first capturing group, and ([a-z]+)
is the second capturing group. When you apply the regex, the first capturing group ends up "capturing" the string 123
(since that's the part it matches), and the second part captures xyz
.
With findall
, it searches the string for all places where the regex matches, and for each match, it returns the list of captured groups as a tuple. I'd encourage you to play with it a bit in ipython
to understand how it works. Also check the docs: https://docs.python.org/3.6/library/re.html#re.findall
Upvotes: 0