Reputation: 133
I have a dataset, however it includes Position #, "Lastname, Firstname M ",,
I am able to split it, however i want the middle initial gone, and the white space gone too, this is what i have:
df = pd.read_excel('C:\\HR employees\\EE Listing as of 5-30-17.xlsx')
df['Last Name'], df['First Name'] = df['NAME'].str.split(',', 1).str
del df['NAME']
df.to_excel('output.xlsx')
It splits the firstname and last name, but how can i get rid of the middle initial?
Upvotes: 0
Views: 430
Reputation: 1481
df = pd.read_excel('C:\\HR employees\\EE Listing as of 5-30-17.xlsx')
df['Last Name'], df['First Name'] = df['NAME'].str.split(',').str
df['First Name'] = df['First Name'].apply(lambda s: s[:-2] if s[-2] == ' ' else s)
del df['NAME']
df.to_excel('output.xlsx')
The lambda function checks if the second last character in the first name is a space. This removes the initial (if present) while also handling two-part first names.
Test cases: (NAME ---> Last Name ---> First Name)
Upvotes: 2
Reputation: 866
What about a loop to copy the first name up until the space between the middle letter?
df['First Name'] = 'dave M'
name = ''
for ch in df['First Name']:
if ch != ' ':
name += ch
else: break
df['First Name'] = name
# OR ALTERNATIVLY
df['First Name'] = df['First Name'].split()[0]
Upvotes: 2