Reputation: 16229
I've a pandas
Dataframe
consisting of a single column which is the extraction from the From field of emails e.g.
From
0 Grey Caulfu <[email protected]>
1 Deren Torculas <[email protected]>
2 Charlto Youna <[email protected]>
I want to take advantage of the str
accessor to split the data into two columns, such that the first column is, Name, contains the actual name (first name last name), and the second column, Email, contains the email address).
If I use:
df = pd.DataFrame(df.From.str.split(' ',1).tolist(),
columns = ['Name','Email'])
This is almost what I need, but it puts the surname in the Email column (i.e. it places the last two items from split()
into this column). How do I modify this so that split()
knows to stop after the first space when populating the first column?
Once we achieve this, we then need to make it a little more robust, so that it can handle names that contain three elements e.g.
Billy R. Valentine <[email protected]>
Yurimov | Globosales <[email protected]>
Upvotes: 2
Views: 2162
Reputation: 394159
You can pass expand=True
and create new columns from the str without having to create a new df:
In [353]:
df[['Name','e-mail']] = df['From'].str.rsplit(' ',1, expand=True)
df
Out[353]:
From Name \
0 Grey Caulfu <[email protected]> Grey Caulfu
1 Deren Torculas <[email protected]> Deren Torculas
2 Charlto Youna <[email protected]> Charlto Youna
e-mail
0 <[email protected]>
1 <[email protected]>
2 <[email protected]>
Upvotes: 0
Reputation: 90979
You can use rsplit()
instead of split()
, to split from the reverse. Example -
In [12]: df1 = pd.DataFrame(df.From.str.rsplit(' ',1).tolist(), columns=['Name','Email'])
In [13]: df1
Out[13]:
Name Email
0 Grey Caulfu <[email protected]>
1 Deren Torculas <[email protected]>
2 Charlto Youna <[email protected]>
Upvotes: 3