Reputation: 55
I'm trying to read bunch of email addresses from a CSV file and extract usernames from these email addresses using Pandas but getting the following error message instead.
ValueError: Length of values does not match length of index
Here is my code which is not complex.
import pandas as pd
import sys
input_file = sys.argv[1]
data_frame = pd.read_csv(input_file)
data_frame['Username'] = data_frame['Email Domain'].str.split("@")[0]
print(data_frame)
What am I doing wrong? Thanks,
Upvotes: 2
Views: 4138
Reputation: 294488
Strictly trying to explain what you are doing wrong
data_frame['Email Domain'].str.split("@")[0]
# splits the strings / \ points to the first row of result
Solution
data_frame['Email Domain'].str.split("@").str[0]
0 abc
1 xyz
2 blah24
Name: Email Domain, dtype: object
setup
Thanks A-Za-z
df = pd.DataFrame(
{'username': ['[email protected]', '[email protected]', '[email protected]']})
Upvotes: 3
Reputation: 38415
Consider this df
df = pd.DataFrame({'username': ['[email protected]', '[email protected]', '[email protected]']})
You can use str.extract to get the usernames like this
df.username.str.extract("(.*)@")
You get
0 abc
1 xyz
2 blah24
Compare this with
df.username.str.split("@")[0]
You get first row
['abc', 'gmail.com']
Upvotes: 2