Reputation: 11
I am new to Pandas and have a hard time figuring out the best way to solve the below problem:
I have a Dataframe with one column called Email like below:
Email
[email protected]
[email protected]
NAN
[email protected]
[email protected]
I separated the strings on '@' to create a Domain column and want to use that column to assign keywords in a new column. For instance, if the domain contains the word 'yahoo', call it 'Yahoo Account' in the new column. If it does not contain the word 'yahoo', assign it the value 'Other Domain', and if it is NaN, call it 'Unknown'.
The new column called Affiliation would look like:
Affiliation
Yahoo Account
Other Domain
Unknown
Yahoo Account
Other Domain
There are over 2,000 different types of domains so am looking for a way where I don't list and map all the unique domains as either being "Yahoo Account" or "Other Domain."
I have looked into a few options, one of which is the where clause, but it assigns NaN values to the Other Domain keyword.
df['Affiliation'] = np.where(df['Domain']=='yahoo', 'Yahoo Account', 'Other Domain')
I have also started to look at using the replace clause, but don't think this is the best way due to the amount of unique domains there are that would need to be added to other_affiliations. See below:
yahoo_affiliations = (r'(yahoo\S*)')
other_affiliations= (r'(gmail\S*)|(hotmail\S*)|(outlook\S*)')
# Create a new column called Affiliation from Domains
df['Affiliation'] = df['Domain']
# Fill NaN with Unknwon
df['Affiliation'] = df['Affiliation'].fillna('Unknown')
replacements = {
'Affiliation': {yahoo_affiliations: 'Yahoo Account',
other_affiliations: 'Other Domain'}
}
df.replace(replacements, regex=True, inplace=True)
Upvotes: 0
Views: 46
Reputation: 7204
You can split them like this to get your mappings
email_map = {'yahoo.com': 'Yahoo Account',
'gmail.com': 'Other Domain',
'gmail.com.it': 'Other Domain'
}
dfa['domain'] = dfa['Email'].str.extract(r'.*?@(.*)')
dfa['Affiliation'] = dfa['domain'].map(email_map).fillna('Unknown')
output:
Email domain Affiliation
0 [email protected] yahoo.com Yahoo Account
1 [email protected] gmail.com Other Domain
2 NAN NaN Unknown
3 [email protected] yahoo.com Yahoo Account
4 [email protected] gmail.com.it Other Domain
Upvotes: 1