Inquisitive
Inquisitive

Reputation: 55

Extracting username from email using Python pandas

I'm trying to read bunch of email addresses from a CSV file and extract usernames from these email addresses using Pandas but getting the following error message instead.

ValueError: Length of values does not match length of index

Here is my code which is not complex.

import pandas as pd
import sys

input_file = sys.argv[1]

data_frame = pd.read_csv(input_file)

data_frame['Username'] = data_frame['Email Domain'].str.split("@")[0]

print(data_frame)

What am I doing wrong? Thanks,

Upvotes: 2

Views: 4138

Answers (2)

piRSquared
piRSquared

Reputation: 294488

Strictly trying to explain what you are doing wrong

data_frame['Email Domain'].str.split("@")[0]
#           splits the strings /           \ points to the first row of result

Solution

data_frame['Email Domain'].str.split("@").str[0]

0       abc
1       xyz
2    blah24
Name: Email Domain, dtype: object

setup
Thanks A-Za-z

df = pd.DataFrame(
    {'username': ['[email protected]', '[email protected]', '[email protected]']})

Upvotes: 3

Vaishali
Vaishali

Reputation: 38415

Consider this df

df = pd.DataFrame({'username': ['[email protected]', '[email protected]', '[email protected]']})

You can use str.extract to get the usernames like this

df.username.str.extract("(.*)@")

You get

0       abc
1       xyz
2    blah24

Compare this with

df.username.str.split("@")[0]

You get first row

['abc', 'gmail.com']

Upvotes: 2

Related Questions