email validation in python pandas dataframe column

Question

I want to highlight all cells in a pandas dataframe column that fail this validity check the color 'red'. Here is my code. In my implementation the whole column email is highlighted red instead of individual cells.

# raw data
df = pd.DataFrame({'Username' : ['arenzo', 'brenzo', 'crenzo', 'drenzo'],
              'Email' : ['place1@sales.org', 'place2@sales.org', 'place3@sales.com', 'place4@stack.net']})

# email validity function
def emailcheck (df):
    validcode = (df['Email'].str.contains('@')) & (df['Email'].str.contains('.org', case= False) & (df['Email'].str.contains('sales', case=False)))
    return validcode

def highlight_email(s):
    if emailcheck(df).all():
        color = ''
    else:
        color = 'red'
    return 'background-color: %s' % color

df.style.applymap(highlight_email, subset=pd.IndexSlice[:, ['Email']])

# dataframe
Username Email
arenzo   place1@sales.org
brenzo   place2@sales.org
crenzo   place3@sales.com
drenzo   place4@stack.net
# last 2 rows under email column should be highlighted red

DJK · Accepted Answer

You're currently checking the entire dataframe against your condition

validcode = (df['Email'].str.contains('@')) & (df['Email'].str.contains('.org', case= False) & (df['Email'].str.contains('sales', case=False)))

then you are testing if all values above are True in your if statement

if emailcheck(df).all():
        color = ''

This will always return false from your data because some values in the data meet your criteria while others dont. Since the evaluation of that if statement is false, your function will return the color red for every cell. Instead just keep the second function and test against the individual value.

def highlight_email(s):
    if '@' in s and '.org' in s and 'sales' in s:
        color = ''
    else:
        color = 'red'
    return 'background-color: {c}'.format(c=color)

email validation in python pandas dataframe column

Answers (1)

Related Questions