Reputation: 2504
I have the following text:
LAST_NAME_1, Firs_name_1 Home Phone: 333-336-6514
192 generic St.
Newton MA 02471
Status: Attender Marital: Married Adult: M/F: Env.No.:
LAST_NAME_2, Firs_name_2 Home Phone: 777-777-2205 Cell Phone: 888-888-8888
10 generic St.
Newton MA 02471
E-mail : [email protected]
Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5
I need to obtain the text after the phone numbers, but they can have Home phone, Cell Phone, Emergency Phone, Fax or work phone in different orders. is there any regular expression that can give me the text after the last phone number?, I mean in the second blockof text get the text after Cell Phone: 888-888-888
?
Upvotes: 0
Views: 90
Reputation: 20506
In [1]: import re
In [2]: s=""" LAST_NAME_1, Firs_name_1 Home Phone: 333-336-6514
Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5""" ...: 192 generic St.
...: Newton MA 02471
...: Status: Attender Marital: Married Adult: M/F: Env.No.:
...:
...:
...: LAST_NAME_2, Firs_name_2 Home Phone: 777-777-2205 Cell Phone: 888-888-8888
...: 10 generic St.
...: Newton MA 02471
...:
...: E-mail : [email protected]
...: Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5"""
In [3]:
In [4]: re.findall('[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)', s, re.MULTILINE)
Out[4]: ['192 generic St. ', '10 generic St. ']
NODE EXPLANATION
-----------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
-----------------------------------------------------
- '-'
-----------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
-----------------------------------------------------
- '-'
-----------------------------------------------------
[0-9]{4} any character of: '0' to '9' (4 times)
-----------------------------------------------------
\n '\n' (newline)
-----------------------------------------------------
( group and capture to \1:
-----------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
------------------------------------------------------
) end of \1
Upvotes: 2
Reputation: 6186
Is this what you want?
doc = '''LAST_NAME_1, Firs_name_1 Home Phone: 333-336-6514
192 generic St.
Newton MA 02471
Status: Attender Marital: Married Adult: M/F: Env.No.:
LAST_NAME_2, Firs_name_2 Home Phone: 777-777-2205 Cell Phone: 888-888-8888
10 generic St.
Newton MA 02471
E-mail : [email protected]
Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5'''
import re
p = re.compile(r'[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)')
for x in p.finditer(doc):
print x.group(1)
The output is
192 generic St.
10 generic St.
Explanation
[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)
__________________________ <- phone number
__ <- newline
__ <- this part is group(1)
Upvotes: 1