git hubber
git hubber

Reputation: 9

How to reference a single column in a def function

First off I know using regular expression is not the best email validation but this is a preliminary step, a better validation comes later.

I want to create a function that validates whether or not an email address is valid but i am not sure how to reference only one column in a data frame.

import pandas as pd

d=[['Automotive','testgmail.com','bob','smith']]
df=pd.DataFrame(d,columns=['industry','email','first',last])

filename='temp'

I want to keep the code in a def function like the one below

def Prospect(colname,errors):
   wrong=[]
   if #reference to column.str.match(r"^.+@.+\..{2,}$"):
       return
    else:
    error='this is an invalid email'
    wrong.append(error)
       return wrong
    
               
print(Prospect(errors,colname))

How do I create a function to only reference a specific column in a data frame and only run that column name through the function and create a print statement saying that the email is invalid?

P.S: speed of the operation is not a huge concern since the datasets are not massive.

desired output:

This is an invalid email

Upvotes: 0

Views: 301

Answers (3)

mozway
mozway

Reputation: 262224

I believe you might want:

def Prospect(colname, errors, df=df):
    
    m = df[colname].str.match(r"^.+@.+\..{2,}$")
    
    if m.all():
        pass
    else:
        error='this is an invalid email'
        errors.append(error)
    
errors = []
Prospect('email', errors, df=df)

print(errors)

output: ['this is an invalid email']

Upvotes: 1

Swifty
Swifty

Reputation: 3419

Ok, here's my take on your question (I've removed the "errors" argument until I understand what it's supposed to be/do):

import pandas as pd
import re

d=[['Automotive','testgmail.com','bob','smith'],
   ['Automotive','[email protected]','bob','smith']]
df=pd.DataFrame(d,columns=['industry','email','first','last'])

def Prospect(colname):
    email_regex = r"^.+@.+\..{2,}$"
    wrong=[]
    for i in range(len(df)):
        this_email = df[colname][i]
        if re.search(email_regex,this_email):
            continue
        else:
            error=f'{this_email} is an invalid email'
            wrong.append(error)
    return wrong

print(Prospect('email'))
# ['testgmail.com is an invalid email']

Upvotes: 0

Noah
Noah

Reputation: 632

import pandas as pd
import re

d=[['Automotive','testgmail.com','bob','smith'],
   ['Automotive','[email protected]','bob','smith']]
df=pd.DataFrame(d,columns=['industry','email','first','last'])

email_regex = regex = '^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$'

df["email"].apply(lambda email: print("This is a valid email: " + email if re.search(email_regex,email) else "This is an invalid email: " + email))

Results in:

This is an invalid email: testgmail.com
This is a valid email: [email protected]

Process finished with exit code 0

Upvotes: 0

Related Questions