Reputation: 55
I want a method to match a whole value of the loaded_list DataFrame with an item from the domain_list. If an email in loaded_list contains a domain in domain_list then it should be populated in match_list.
I have tried many methods such as contains(domain_list), loaded_list == domain_list - with [row] and DataFrame column header name and IsIn method from pandas. All no luck
loaded_list = []
match_list = []
domain_list = ['@hotmail.co.uk', '@gmail.com']
#This line below is from List to DataFrame
domain_list = pd.DataFrame(domain_list, columns=['Email Address'])
with open(self.breach_file, 'r', encoding='utf-8-sig') as breach_file:
found_reader = pd.read_csv(breach_file, sep=':', names=['Email Address'], engine='c')
loaded_list = found_reader
print("List Parsed... Enumerating Content Types")
breach_file.close()
match_list = ???
print(f"Match:\n {match_list}")
The expected outcome I would like is the var match_list displaying the emails in loaded_list which contain domain_list.
Many errors have popped up from the methods tried (isin, contains()). Dont want to use For Loops as ill be processing large data.
List Examples
loaded_list:
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
domain_list:
@gmail.com
@hotmail.co.uk
Upvotes: 1
Views: 75
Reputation: 125
Did you try generating a regex with your domain_list by concatening the values separated by "|" then filter loaded_list using this generated pattern ?
Example:
In[1]: loaded_list=pd.Series([
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]"
])
In[2]: domain_list=pd.Series([
"@gmail.com",
"@hotmail.co.uk"
])
In[3]: import re
In[4]: match_list = loaded_list[loaded_list.str.contains(domain_list.apply(re.escape).str.cat(sep="|"))]
In[5]: match_list
Out[5]:
0 [email protected]
2 [email protected]
dtype: object
I escaped all special characters in domain_list (to avoid any problem with regex special characters) and then used cat to join all domain_list patterns in one pattern with multiple alternatives using the str.cat method.
Upvotes: 1