Reputation: 9
First off I know using regular expression is not the best email validation but this is a preliminary step, a better validation comes later.
I want to create a function that validates whether or not an email address is valid but i am not sure how to reference only one column in a data frame.
import pandas as pd
d=[['Automotive','testgmail.com','bob','smith']]
df=pd.DataFrame(d,columns=['industry','email','first',last])
filename='temp'
I want to keep the code in a def function like the one below
def Prospect(colname,errors):
wrong=[]
if #reference to column.str.match(r"^.+@.+\..{2,}$"):
return
else:
error='this is an invalid email'
wrong.append(error)
return wrong
print(Prospect(errors,colname))
How do I create a function to only reference a specific column in a data frame and only run that column name through the function and create a print statement saying that the email is invalid?
P.S: speed of the operation is not a huge concern since the datasets are not massive.
desired output:
This is an invalid email
Upvotes: 0
Views: 301
Reputation: 262224
I believe you might want:
def Prospect(colname, errors, df=df):
m = df[colname].str.match(r"^.+@.+\..{2,}$")
if m.all():
pass
else:
error='this is an invalid email'
errors.append(error)
errors = []
Prospect('email', errors, df=df)
print(errors)
output: ['this is an invalid email']
Upvotes: 1
Reputation: 3419
Ok, here's my take on your question (I've removed the "errors" argument until I understand what it's supposed to be/do):
import pandas as pd
import re
d=[['Automotive','testgmail.com','bob','smith'],
['Automotive','[email protected]','bob','smith']]
df=pd.DataFrame(d,columns=['industry','email','first','last'])
def Prospect(colname):
email_regex = r"^.+@.+\..{2,}$"
wrong=[]
for i in range(len(df)):
this_email = df[colname][i]
if re.search(email_regex,this_email):
continue
else:
error=f'{this_email} is an invalid email'
wrong.append(error)
return wrong
print(Prospect('email'))
# ['testgmail.com is an invalid email']
Upvotes: 0
Reputation: 632
import pandas as pd
import re
d=[['Automotive','testgmail.com','bob','smith'],
['Automotive','[email protected]','bob','smith']]
df=pd.DataFrame(d,columns=['industry','email','first','last'])
email_regex = regex = '^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$'
df["email"].apply(lambda email: print("This is a valid email: " + email if re.search(email_regex,email) else "This is an invalid email: " + email))
Results in:
This is an invalid email: testgmail.com
This is a valid email: [email protected]
Process finished with exit code 0
Upvotes: 0