Daniel Teixeira
Daniel Teixeira

Reputation: 1

Replace on for loop list

My ideia is to find every email in a sentence and replace it for a different random email (anonymization). But I can't get the result I want. Every email is replaced for the same one or I get an error (list index out of range)

input: email = "[email protected] sent it to [email protected]"

output I want email = "[email protected] sent it to [email protected]"

random_emails = ["albert", "john", "mary"]


def find_email(email: str):
    result = email
    i = 0
    email_address = r"\S+@"
    for text in email:
            result = re.sub(email_address, random_emails[i] + "@", result)
            i += 1
    return result

print(find_email(email))

Upvotes: -1

Views: 67

Answers (2)

Rica Gurgel
Rica Gurgel

Reputation: 126

You dont need for loop, and I think your RegExr can be improved

def find_email(email):
    result = email
    email_address = r"(\w+@)(\w+.* )(\w+@)(\w+.*)"
    a='AAAAA@'
    b='BBBBB@'
    result = re.sub(email_address, rf'{a}\2{b}\4', result)
    return result


email = "[email protected] sent it to [email protected]"
print(find_email(email))

Explaining:

You can create substitution groups:

1º = 1º email 2º = server and texts 3º = 2º email 4º = server.com

And now, you just need to replace \1 and \2 with everythink you want

example2: Your new routine

import re
from random import seed
from random import randint

random_emails = ["albert", "john", "mary"]


def find_email(email):
    result = email
    email_address = r"(\w+@)(\w+.* )(\w+@)(\w+.*)"
    first = randint(0, 2)
    second = randint(0, 2)
    while first == second:
        second = randint(0, 2)
    result = re.sub(email_address, rf'{random_emails[first]}@\2{random_emails[second]}@\4', result)
    return result


email = "[email protected] sent it to [email protected]"
print(find_email(email))

I used random to generate an random number to got emails from list. And "while first == second:" just to not repeat first and second emails

Upvotes: 0

Phoenixo
Phoenixo

Reputation: 2113

I found a solution, but note that identical emails will be anonymized in the same way. I let you try this :

import re

email = "[email protected] sent it to [email protected]"
random_emails = ["albert", "john", "mary"]

def find_email(email: str):
    result = email
    i = 0
    email_address = r"\S+@"
    regex_matches = re.findall(email_address, email)
    for match in regex_matches:
        result = result.replace(match, random_emails[i] + "@")
        i += 1
    return result

print(find_email(email))

Upvotes: 0

Related Questions