Reputation: 357
I have a column with names, and they are all concatenated (that is, there is no space between the first and last name). I am trying to split the first and last name, which has already been asked on this website. However here, some names have dashes \-
or apostrophes \'
.
Speed-WagonMario
CruiserPetey
SthesiaAnna
De’wayneJohn
I want to make sure it is catched by my regex query:
clean_names = re.split(r'([A-Z][a-z\']+\-[A-Z][a-z\']+|[A-Z][a-z\']+)', names)
It works for dashes, which happen only before an uppercase letter, but not for apostrophes.
Does anyone has an opinion on how to fix my query ? Thanks in advance
Upvotes: 0
Views: 254
Reputation: 48640
You can combine a positive lookbehind (lower-case) with a positive lookahead (uppercase). Both of the matched lookarounds are kept when they are split.
/ // BEGIN EXPRESSION
(?<=[a-z]) // POSITIVE LOOKBEHIND [a-z]
(?=[A-Z]) // POSITIVE LOOKAHEAD [A-Z]
/ // END EXPRESSION
#!/usr/bin/env python3
import re
def pair_to_person(pair):
person = {}
person['firstName'] = pair[1]
person['lastName'] = pair[0]
return person
def parse_name_column(column_text):
return map(pair_to_person,
map(lambda name: re.split(r'(?<=[a-z])(?=[A-Z])', name),
map(lambda x: x, column_text.strip().split('\n'))))
print_list = lambda list: print('\n'.join(map(str, list)))
if __name__ == '__main__':
column_text = '''
Speed-WagonMario
CruiserPetey
SthesiaAnna
De’wayneJohn
'''
names = parse_name_column(column_text)
print_list(names)
{'firstName': 'Mario', 'lastName': 'Speed-Wagon'}
{'firstName': 'Petey', 'lastName': 'Cruiser'}
{'firstName': 'Anna', 'lastName': 'Sthesia'}
{'firstName': 'John', 'lastName': 'De’wayne'}
const data = `
Speed-WagonMario
CruiserPetey
SthesiaAnna
De’wayneJohn
`;
const names = data.trim().split('\n')
.map(name => name.trim().split(/(?<=[a-z])(?=[A-Z])/))
.map(pair => ({ firstName: pair[1], lastName: pair[0] }));
console.log(names);
.as-console-wrapper { top: 0; max-height: 100% !important; }
Upvotes: 2