Pyderman
Pyderman

Reputation: 16229

Splitting a pandas DataFrame of email 'From' field into sender's name, email address

I've a pandas Dataframe consisting of a single column which is the extraction from the From field of emails e.g.

                                                   From
0          Grey Caulfu <[email protected]>
1                   Deren Torculas <[email protected]>
2            Charlto Youna <[email protected]>

I want to take advantage of the str accessor to split the data into two columns, such that the first column is, Name, contains the actual name (first name last name), and the second column, Email, contains the email address).

If I use:

df = pd.DataFrame(df.From.str.split(' ',1).tolist(),
                                   columns = ['Name','Email'])

This is almost what I need, but it puts the surname in the Email column (i.e. it places the last two items from split() into this column). How do I modify this so that split() knows to stop after the first space when populating the first column?

Once we achieve this, we then need to make it a little more robust, so that it can handle names that contain three elements e.g.

Billy R. Valentine <[email protected]>
Yurimov | Globosales <[email protected]>

Upvotes: 2

Views: 2162

Answers (2)

EdChum
EdChum

Reputation: 394159

You can pass expand=True and create new columns from the str without having to create a new df:

In [353]:
df[['Name','e-mail']] = df['From'].str.rsplit(' ',1, expand=True)
df

Out[353]:
                                         From            Name  \
0         Grey Caulfu <[email protected]>     Grey Caulfu   
1  Deren Torculas <[email protected]>  Deren Torculas   
2    Charlto Youna <[email protected]>   Charlto Youna   

                        e-mail  
0      <[email protected]>  
1  <[email protected]>  
2   <[email protected]>  

Upvotes: 0

Anand S Kumar
Anand S Kumar

Reputation: 90979

You can use rsplit() instead of split() , to split from the reverse. Example -

In [12]: df1 = pd.DataFrame(df.From.str.rsplit(' ',1).tolist(), columns=['Name','Email'])

In [13]: df1
Out[13]:
             Name                        Email
0     Grey Caulfu      <[email protected]>
1  Deren Torculas  <[email protected]>
2   Charlto Youna   <[email protected]>

Upvotes: 3

Related Questions