jeangelj
jeangelj

Reputation: 4498

Python Pandas Index error: List Index out of range

My code worked on a previous dataset and now stopped working. I looked through other answers for this error message, but none seems applicable to mine.

I have one column in my dataframe df for Email_Address and I would like to just split the domain out into a new columns.

My dataframe is a subset of a previous df.

#create new df, for only email addresses I need to review
df = df_raw.loc[df_raw['Review'] == 'Y'].copy()

#I reset the index to fix the problem, but it didnt help
df = df.reset_index(drop=True)

#ensure Email Address is a string
df['Email_Address']= df.Email_Address.apply(str)

#make Email Address lower case
df['email_lowercase'] = df['Email_Address'].str.lower()

#Split out domain into a new column 
df['domain'] = df['email_lowercase'].apply(lambda x: x.split('@')[1])

IndexError: list index out of range

Upvotes: 2

Views: 4973

Answers (1)

Jan Zeiseweis
Jan Zeiseweis

Reputation: 3738

You most likely have invalid emails in your dataframe. You can identify these by using

df[~df.Email_Address.astype(str).str.contains('@')]

You could use this approach to extract the domain

def extract_domain(email):
    email_domain = email.split('@')
    if len(email_domain) > 1:
        return email_domain[1]

df['domain'] = df['email_lowercase'].apply(extract_domain)

or even shorter:

df['domain'] = df['email_lowercase'].str.split('@').apply(lambda li: li[1] if len(li) > 1 else None)

Upvotes: 3

Related Questions