Reputation: 1
My ideia is to find every email in a sentence and replace it for a different random email (anonymization). But I can't get the result I want. Every email is replaced for the same one or I get an error (list index out of range)
input: email = "[email protected] sent it to [email protected]"
output I want email = "[email protected] sent it to [email protected]"
random_emails = ["albert", "john", "mary"]
def find_email(email: str):
result = email
i = 0
email_address = r"\S+@"
for text in email:
result = re.sub(email_address, random_emails[i] + "@", result)
i += 1
return result
print(find_email(email))
Upvotes: -1
Views: 67
Reputation: 126
You dont need for loop, and I think your RegExr can be improved
def find_email(email):
result = email
email_address = r"(\w+@)(\w+.* )(\w+@)(\w+.*)"
a='AAAAA@'
b='BBBBB@'
result = re.sub(email_address, rf'{a}\2{b}\4', result)
return result
email = "[email protected] sent it to [email protected]"
print(find_email(email))
Explaining:
You can create substitution groups:
1º = 1º email 2º = server and texts 3º = 2º email 4º = server.com
And now, you just need to replace \1 and \2 with everythink you want
example2: Your new routine
import re
from random import seed
from random import randint
random_emails = ["albert", "john", "mary"]
def find_email(email):
result = email
email_address = r"(\w+@)(\w+.* )(\w+@)(\w+.*)"
first = randint(0, 2)
second = randint(0, 2)
while first == second:
second = randint(0, 2)
result = re.sub(email_address, rf'{random_emails[first]}@\2{random_emails[second]}@\4', result)
return result
email = "[email protected] sent it to [email protected]"
print(find_email(email))
I used random to generate an random number to got emails from list. And "while first == second:" just to not repeat first and second emails
Upvotes: 0
Reputation: 2113
I found a solution, but note that identical emails will be anonymized in the same way. I let you try this :
import re
email = "[email protected] sent it to [email protected]"
random_emails = ["albert", "john", "mary"]
def find_email(email: str):
result = email
i = 0
email_address = r"\S+@"
regex_matches = re.findall(email_address, email)
for match in regex_matches:
result = result.replace(match, random_emails[i] + "@")
i += 1
return result
print(find_email(email))
Upvotes: 0