WoJ
WoJ

Reputation: 29957

How to extract non-uppercase string elements for first and last names?

I have strings of the form

NAME Firstame

and I would like to get the Firstname part. The string can be more complicated (LAST LAST2 First First2). The rule is that uppercase elements are the last name and the rest is the first name. We can assume that the first part will be upper case (= last name) and when it starts to be mixed case it is the first name until the end.

I am sure that the right regex combination of [A-Z] and \w would work. The best I came up with is

import re
re.findall('[A-Z]*\w+', 'LAST LAST2 First First2')

but it returns almost the right solution (['LAST', 'LAST2', 'First', 'First2']) :)

What would be a good way to extract this first name(s) in Python as one string?

Upvotes: 3

Views: 112

Answers (4)

alvas
alvas

Reputation: 122012

With regex:

import re
s = 'LAST LAST2 First First2'
print re.search("[A-Z][a-z].*$",s).group().split()
  • [A-Z] match a single character present in the range between A and Z (case sensitive)
  • [a-z] match a single character present in the range between a and z (case sensitive)
  • .* matches any character (except newline) Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
  • $ assert position at end of the string

Non-regex

s = 'LAST LAST2 First First2'
print [i for i in s.split() if not i.isupper()]

[out]:

['First', 'First2']

Upvotes: 1

WKPlus
WKPlus

Reputation: 7255

Will this code help you:

re.search("[A-Z][a-z].*$","LAST LAST2 First First2").group()

Or it can be more robust:

re.search("(?<= )[A-Z][^A-Z][\w|\s]*$","LAST LAST2 First First2").group()

Upvotes: 0

JonM
JonM

Reputation: 1374

try:

import re
re.findall('\b[A-Z][a-z0-9_-]+', 'LAST LAST2 First First2')

this will result in:

# Run findall
>>> regex.findall(string)
[u'First', u'First2']

Upvotes: 0

senshin
senshin

Reputation: 10350

I would like to propose a non-regex solution:

string = 'LAST LAST2 First First2'
words = string.split(' ') # equals ['LAST', 'LAST2', 'First', 'First2']
result = []
for word in words:
    if not word.isupper():
        result.append(word)
print(' '.join(result))

Result:

First First2

Upvotes: 2

Related Questions