Reputation: 187
I have a DataFrame in Python with a column with names (such as Joseph Haydn, Wolfgang Amadeus Mozart, Antonio Salieri and so forth).
I want to get a new column with the last names: Haydn, Mozart, Salieri and so forth.
I know how to split a string, but I could not find a way to apply it to a series, or a Data Frame column.
Upvotes: 14
Views: 49427
Reputation: 1
Try this to solve your problem:
import pandas as pd
df = pd.DataFrame(
{'composers':
[
'Joseph Haydn',
'Wolfgang Amadeus Mozart',
'Antonio Salieri',
'Eumir Deodato',
]
}
)
df['lastname'] = df['composers'].str.split(n = 0, expand = False).str[1]
You can now find the DataFrame, as shown below.
composers lastname
0 Joseph Haydn Haydn
1 Wolfgang Amadeus Mozart Amadeus Mozart
2 Antonio Salieri Salieri
3 Eumir Deodato Deodato
Upvotes: -1
Reputation: 18695
if you have:
import pandas
data = pandas.DataFrame({"composers": [
"Joseph Haydn",
"Wolfgang Amadeus Mozart",
"Antonio Salieri",
"Eumir Deodato"]})
assuming you want only the first name (and not the middle name like Amadeus):
data.composers.str.split('\s+').str[0]
will give:
0 Joseph
1 Wolfgang
2 Antonio
3 Eumir
dtype: object
you can assign this to a new column in the same dataframe:
data['firstnames'] = data.composers.str.split('\s+').str[0]
Last names would be:
data.composers.str.split('\s+').str[-1]
which gives:
0 Haydn
1 Mozart
2 Salieri
3 Deodato
dtype: object
(see also Python Pandas: selecting element in array column for accessing elements in an 'array' column)
For all but the last names you can apply " ".join(..)
to all but the last element ([:-1]
) of each row:
data.composers.str.split('\s+').str[:-1].apply(lambda parts: " ".join(parts))
which gives:
0 Joseph
1 Wolfgang Amadeus
2 Antonio
3 Eumir
dtype: object
Upvotes: 31