Haris
Haris

Reputation: 85

How to extract person name using regular expression?

I am new to Regular Expression and I have kind of a phone directory. I want to extract the names out of it. I wrote this (below), but it extracts lots of unwanted text rather than just names. Can you kindly tell me what am i doing wrong and how to correct it? Here is my code:

import re

directory = '''Mark Adamson
Home: 843-798-6698
(424) 345-7659
265-1864 ext. 4467
326-665-8657x2986
E-mail:[email protected]
Allison Andrews
Home: 612-321-0047
E-mail: [email protected]
Cellular: 612-393-0029
Dustin Andrews'''


nameRegex = re.compile('''
(
[A-Za-z]{2,25}
\s
([A-Za-z]{2,25})+
)

''',re.VERBOSE)

print(nameRegex.findall(directory)) 

the output it gives is:

[('Mark Adamson', 'Adamson'), ('net\nAllison', 'Allison'), ('Andrews\nHome', 'Home'), ('com\nCellular', 'Cellular'), ('Dustin Andrews', 'Andrews')]

Would be really grateful for help!

Upvotes: 4

Views: 1877

Answers (4)

Booboo
Booboo

Reputation: 44043

Try:

nameRegex = re.compile('^((?:\w+\s*){2,})$', flags=re.MULTILINE)

This will only choose complete lines that are made up of two or more names composed of 'word' characters.

Upvotes: 0

milanbalazs
milanbalazs

Reputation: 5319

The following regex works as expected.

Related part of the code:

nameRegex = re.compile(r"^[a-zA-Z]+[',. -][a-zA-Z ]?[a-zA-Z]*$", re.MULTILINE)

print(nameRegex.findall(directory) 

Output:

>>> python3 test.py 
['Mark Adamson', 'Allison Andrews', 'Dustin Andrews']

Upvotes: 0

benrussell80
benrussell80

Reputation: 347

Your problem is that \s will also match newlines. Instead of \s just add a space. That is

name_regex = re.compile('[A-Za-z]{2,25} [A-Za-z]{2,25}')

This works if the names have exactly two words. If the names have more than two words (middle names or hyphenated last names) then you may want to expand this to something like:

name_regex = re.compile(r"^([A-Za-z \-]{2,25})+$", re.MULTILINE)

This looks for one or more words and will stretch from the beginning to end of a line (e.g. will not just get 'John Paul' from 'John Paul Jones')

Upvotes: 6

Daniel Nudelman
Daniel Nudelman

Reputation: 401

I can suggest to try the next regex, it works for me:

"([A-Z][a-z]+\s[A-Z][a-z]+)"

Upvotes: 0

Related Questions