davidjbeiler
davidjbeiler

Reputation: 133

Split string with names and middle initial? (python3)

I have a dataset, however it includes Position #, "Lastname, Firstname M ",,

I am able to split it, however i want the middle initial gone, and the white space gone too, this is what i have:

df = pd.read_excel('C:\\HR employees\\EE Listing  as of 5-30-17.xlsx')

df['Last Name'], df['First Name'] = df['NAME'].str.split(',', 1).str
del df['NAME']

df.to_excel('output.xlsx')

It splits the firstname and last name, but how can i get rid of the middle initial?

Upvotes: 0

Views: 430

Answers (2)

Aditya Barve
Aditya Barve

Reputation: 1481

df = pd.read_excel('C:\\HR employees\\EE Listing  as of 5-30-17.xlsx')

df['Last Name'], df['First Name'] = df['NAME'].str.split(',').str
df['First Name'] = df['First Name'].apply(lambda s: s[:-2] if s[-2] == ' ' else s)
del df['NAME']    

df.to_excel('output.xlsx')

The lambda function checks if the second last character in the first name is a space. This removes the initial (if present) while also handling two-part first names.

Test cases: (NAME ---> Last Name ---> First Name)

  • Smith, John R ---> Smith ---> John
  • JOHN, DOE ---> JOHN ---> DOE
  • Jackson, Jean Paul M ---> Jackson ---> Jean Paul
  • Gonzales, El Torado ---> Gonzales ---> El Torado

Upvotes: 2

syntaxError
syntaxError

Reputation: 866

What about a loop to copy the first name up until the space between the middle letter?

df['First Name'] = 'dave M'
name = ''
for ch in df['First Name']:
     if ch != ' ':
          name += ch
     else: break
df['First Name'] = name


# OR ALTERNATIVLY 
df['First Name'] = df['First Name'].split()[0]

Upvotes: 2

Related Questions