Reputation: 29957
I have strings of the form
NAME Firstame
and I would like to get the Firstname
part. The string can be more complicated (LAST LAST2 First First2
). The rule is that uppercase elements are the last name and the rest is the first name. We can assume that the first part will be upper case (= last name) and when it starts to be mixed case it is the first name until the end.
I am sure that the right regex combination of [A-Z]
and \w
would work. The best I came up with is
import re
re.findall('[A-Z]*\w+', 'LAST LAST2 First First2')
but it returns almost the right solution (['LAST', 'LAST2', 'First', 'First2']
) :)
What would be a good way to extract this first name(s) in Python as one string?
Upvotes: 3
Views: 112
Reputation: 122012
With regex:
import re
s = 'LAST LAST2 First First2'
print re.search("[A-Z][a-z].*$",s).group().split()
[A-Z]
match a single character present in the range between A and Z
(case sensitive) [a-z]
match a single character present in the
range between a and z (case sensitive) .*
matches any character
(except newline) Quantifier: Between zero and unlimited times, as
many times as possible, giving back as needed [greedy] $
assert
position at end of the stringNon-regex
s = 'LAST LAST2 First First2'
print [i for i in s.split() if not i.isupper()]
[out]:
['First', 'First2']
Upvotes: 1
Reputation: 7255
Will this code help you:
re.search("[A-Z][a-z].*$","LAST LAST2 First First2").group()
Or it can be more robust:
re.search("(?<= )[A-Z][^A-Z][\w|\s]*$","LAST LAST2 First First2").group()
Upvotes: 0
Reputation: 1374
try:
import re
re.findall('\b[A-Z][a-z0-9_-]+', 'LAST LAST2 First First2')
this will result in:
# Run findall
>>> regex.findall(string)
[u'First', u'First2']
Upvotes: 0
Reputation: 10350
I would like to propose a non-regex solution:
string = 'LAST LAST2 First First2'
words = string.split(' ') # equals ['LAST', 'LAST2', 'First', 'First2']
result = []
for word in words:
if not word.isupper():
result.append(word)
print(' '.join(result))
Result:
First First2
Upvotes: 2