Reputation: 629
So I have something like this:
data = ['Alice Smith and Bob', 'Tim with Sam Dunken', 'Uncle Neo & 31']
I want to replace every element with the first name so it would look like this:
data = ['Alice Smith', 'Tim', 'Uncle Neo']
So far I got:
for i in range(len(data)):
if re.match('(.*) and|with|\&', data[i]):
a = re.match('(.*) and|with|\&', data[i])
data[i] = a.group(1)
But it doesn't seem to work, I think it's because of my pattern but I can't figure out the right way to do this.
Upvotes: 1
Views: 1095
Reputation: 22817
I would suggest using Casimir's answer if possible, but, if you are not sure what word might follow (that is to say that and
, with
, and &
are dynamic), then you can use this regex.
Note: This regex will not work for some special cases such as names with apostrophes '
or dashes -
, but you can add them to the character list that you're searching for. This answer also depends on the name beginning with an uppercase character and the "union word" as I'll name it (and
, with
, &
, etc.) not beginning with an uppercase character.
Regex
^((?:[A-Z][a-z]*\s*)+)\s.*
Substitution
$1
Alice Smith and Bob
Tim with Sam Dunken
Uncle Neo & 31
Alice Smith
Tim
Uncle Neo
^
[A-Z]
[a-z]*
*
instead) \s*
(...)+
: where ... contains everything above$1
: Replace with capture group 1Upvotes: 0
Reputation: 92854
Simplify your approach to the following:
import re
data = ['Alice Smith and Bob', 'Tim with Sam Dunken', 'Uncle Neo & 31']
data = [re.search(r'.*(?= (and|with|&))', i).group() for i in data]
print(data)
The output:
['Alice Smith', 'Tim', 'Uncle Neo']
.*(?= (and|with|&))
- positive lookahead assertion, ensures that name/surname .*
is followed by any item from the alternation group (and|with|&)
Upvotes: 0
Reputation: 71451
You can try this:
import re
data = ['Alice Smith and Bob', 'Tim with Sam Dunken', 'Uncle Neo & 31']
final_data = [re.sub('\sand.*?$|\s&.*?$|\swith.*?$', '', i) for i in data]
Output:
['Alice Smith', 'Tim', 'Uncle Neo']
Upvotes: 0
Reputation: 140168
The |
needs grouping with parentheses in your attempt. Anyway, it's too complex.
I would just use re.sub
to remove the separation word & the rest:
data = [re.sub(" (and|with|&) .*","",d) for d in data]
result:
['Alice Smith', 'Tim', 'Uncle Neo']
Upvotes: 0
Reputation: 89547
Use a list comprehension with re.split:
result = [re.split(r' (?:and|with|&) ', x)[0] for x in data]
Upvotes: 2