Reputation: 4636
I'm trying to build a script that create different variations of a person's name to test its email. Basically, what I want the script to do is:
If I input "John Smith" I need to get in return a list containing [john, johnsmith, john.smith, john_smith, smith, jsmith, j.smith, smithj, smith.j, j_smith, smith_j,smithjohn, smith.john, smith_john, etc]
If I input "John May Smith" I need to get in return a list containing [john, johnmay, johnsmith, john.may, john.smith, john_may, john_smith, jmay, jsmith, j.may, j.smith, j_may, j_smith, johnmaysmith, john.may.smith, john_may_smith, jms, johnms, john.m.s, john_m_s, jmsmith, j.m.smith, j_m_smith, j.m.s, j_m_s, jmays, j.may.s, j_may_s, etc]
. Technically, it would be three lists with name parts: [j, john][m, may][s, smith]
that would mix in different orders and the parts could be separated or not by "." or "_".
John Smith and John May Smith are only examples, I should be able to enter any name, decompose it and mix its parts, initials and separators ('.' and '_').
To decompose a name I'm using the following:
import nameparser
name="John May Smith"
name=nameparser.HumanName(name)
parts=[]
for i in name:
j=[i[0],i]
parts.append(j)
This way parts
gets like this:
[['j', 'john'], ['m', 'may'], ['s', 'smith']]
Note that the list in this case has three sublists, however it could have been 2, 4, 5 or 6.
I created another list called separators:
separators=['.','_']
My question is: What is the best way to mix those lists to create a list of possible email local-parts* as described in the example above? I've been burning my brain to find a way to do it for a few days but haven't been able to.
*Local-part is what comes before the @ (in [email protected], the local part would be "jmaysmith").
Upvotes: 0
Views: 87
Reputation: 1703
the following code should do what you want
from nameparser import HumanName
from itertools import product, chain, combinations
def name_combinations(name):
name=HumanName(name)
parts=[]
ret=[]
for i in name:
j=[i[0].lower(),i.lower()]
ret.append(i.lower())
parts.append(j)
separators=['','.','_']
for r in range(2,len(parts)+1):
for c in combinations(parts,r):
ret = chain(ret,map(lambda l: l[0].join(l[1:]),product(separators,*c)))
return ret
print(list(name_combinations(name)))
In your examples I have not seen jms
, j.s
or js
in your examples. If that is intentional feel free to clarify what should be excluded.
For reference: The output is
>>> print(list(name_combinations("John Smith")))
['john', 'smith', 'js', 'jsmith', 'johns', 'johnsmith', 'j.s', 'j.smith', 'john.s', 'john.smith', 'j_s', 'j_smith', 'john_s', 'john_smith']
>>> print(list(name_combinations("John May Smith")))
['john', 'may', 'smith', 'jm', 'jmay', 'johnm', 'johnmay', 'j.m', 'j.may', 'john.m', 'john.may', 'j_m', 'j_may', 'john_m', 'john_may', 'js', 'jsmith', 'johns', 'johnsmith', 'j.s', 'j.smith', 'john.s', 'john.smith', 'j_s', 'j_smith', 'john_s', 'john_smith', 'ms', 'msmith', 'mays', 'maysmith', 'm.s', 'm.smith', 'may.s', 'may.smith', 'm_s', 'm_smith', 'may_s', 'may_smith', 'jms', 'jmsmith', 'jmays', 'jmaysmith', 'johnms', 'johnmsmith', 'johnmays', 'johnmaysmith', 'j.m.s', 'j.m.smith', 'j.may.s', 'j.may.smith', 'john.m.s', 'john.m.smith', 'john.may.s', 'john.may.smith', 'j_m_s', 'j_m_smith', 'j_may_s', 'j_may_smith', 'john_m_s', 'john_m_smith', 'john_may_s', 'john_may_smith']
Upvotes: 1