user12575866
user12575866

Reputation: 107

Trying to extract ONLY last name using regex from list

I'm having some problem extracting the last name from a list.

list = ['Cristiano Ronaldo', 'L. Messi', 'M. Neuer', 'L. Suarez', 'De Gea', 'Z. Ibrahimovic', 'G. Bale', 'J. Boateng', 'R. Lewandowski']

for item in list:
    print(item)
    print(re.findall(r'(\s(.*))', item))

But the output is as such:

Cristiano Ronaldo
[(' Ronaldo', 'Ronaldo')]
L. Messi
[(' Messi', 'Messi')]
M. Neuer
[(' Neuer', 'Neuer')]
L. Suarez
[(' Suarez', 'Suarez')]
De Gea
[(' Gea', 'Gea')]
Z. Ibrahimovic
[(' Ibrahimovic', 'Ibrahimovic')]
G. Bale
[(' Bale', 'Bale')]
J. Boateng
[(' Boateng', 'Boateng')]
R. Lewandowski
[(' Lewandowski', 'Lewandowski')]

I am curious as to why the last names were returned twice; I only want to get back the last names once.

Can any of you kind folks help? Thank you!

Upvotes: 0

Views: 699

Answers (4)

Toto
Toto

Reputation: 91375

\S matches any character that is not a space.

list = ['Cristiano Ronaldo', 'L. Messi', 'M. Neuer', 'L. Suarez', 'De Gea', 'Z. Ibrahimovic', 'G. Bale', 'J. Boateng', 'R. Lewandowski']

for item in list:
    print(item)
    print(re.findall(r'\S+$', item)) # match 1 or more non space before end of string

Output:

Cristiano Ronaldo
['Ronaldo']
L. Messi
['Messi']
M. Neuer
['Neuer']
L. Suarez
['Suarez']
De Gea
['Gea']
Z. Ibrahimovic
['Ibrahimovic']
G. Bale
['Bale']
J. Boateng
['Boateng']
R. Lewandowski
['Lewandowski']

Upvotes: 1

FlorianGD
FlorianGD

Reputation: 2436

You create 2 group with the two pairs of brackets. Remove the outer one and you will get only the last name:

list = ['Cristiano Ronaldo', 'L. Messi', 'M. Neuer', 'L. Suarez', 'De Gea', 'Z. Ibrahimovic', 'G. Bale', 'J. Boateng', 'R. Lewandowski'] 
for item in list: 
    print(item) 
    print(re.findall(r'\s(.*)', item))

Upvotes: 3

Ron Serruya
Ron Serruya

Reputation: 4426

Check this out https://regex101.com/r/CGrruO/1

You can see that your regex returns 2 matches.
You added another set of () so you got two matches, one with space and one without.

Changing to \s(.*) should work

Upvotes: 0

Rakesh
Rakesh

Reputation: 82755

Use str.split() with negative indexing

Ex:

lst = ['Cristiano Ronaldo', 'L. Messi', 'M. Neuer', 'L. Suarez', 'De Gea', 'Z. Ibrahimovic', 'G. Bale', 'J. Boateng', 'R. Lewandowski']

for item in lst:
    print(item)
    print(item.split()[-1])

Output:

Ronaldo
Messi
Neuer
Suarez
Gea
Ibrahimovic
Bale
Boateng
Lewandowski

Upvotes: 3

Related Questions