Devesh
Devesh

Reputation: 127

Extracting particular characters/ text from DataFrame column

I am trying to get the email provider from the mail column of the Dataframe and create a new column named "Mail_Provider". For example, taking gmail from [email protected] and storing it in "Mail_Provider" column. Also I would like to extract Country ISD fro Phone column and Create a new column for that. Is there any other straight/simpler method other than regex.

data = pd.DataFrame({"Name":["A","B","C"],"mail": 
["[email protected]","[email protected]","[email protected]"],"Adress": 
["Adress1","Adress2","Adress3"],"Phone":["+91-1234567890","+88- 
0987654321","+27-2647589201"]})

Table

Name   mail        Adress       Phone

A    [email protected]   Adress1  +91-1234567890
B    [email protected]   Adress2  +88-0987654321
C    [email protected]   Adress3  +27-2647589201

Result expected:-

Name   mail        Adress       Phone        Mail_Provider   ISD

A    [email protected]   Adress1  +91-1234567890    gmail           91
B    [email protected]   Adress2  +88-0987654321    yahoo           88
C    [email protected]   Adress3  +27-2647589201    gmail           27

Upvotes: 5

Views: 1864

Answers (3)

Quang Hoang
Quang Hoang

Reputation: 150735

Regex is rather simple as these:

data['Mail_Provider'] = data['mail'].str.extract('\@(\w+)\.')

data['ISD'] = data['Phone'].str.extract('\+(\d+)-')

If you really want to avoid regex, @Eva's answer would be the way to go.

Upvotes: 9

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Mixed approach (regex and simple slicing):

In [693]: df['Mail_Provider'] = df['mail'].str.extract('@([^.]+)')

In [694]: df['ISD'] = df['Phone'].str[1:3]

In [695]: df
Out[695]: 
  Name         mail   Adress           Phone Mail_Provider ISD
0    A  [email protected]  Adress1  +91-1234567890         gmail  91
1    B  [email protected]  Adress2  +88-0987654321         yahoo  88
2    C  [email protected]  Adress3  +27-2647589201         gmail  27

Upvotes: 5

eva-vw
eva-vw

Reputation: 670

A lambda function will work

data['Mail_Provider'] = data['mail'].apply(lambda x: x.split("@")[1].split(".")[0])

data['ISD'] = data['Phone'].apply(lambda x: x.split("+")[1].split("-")[0])

Upvotes: 4

Related Questions