Reputation: 1217
There is a string:
str = 'Please Contact Prof. Zheng Zhao: <a href="mailto:[email protected]">[email protected]</a> for details, or our HR: [email protected]'
I wanted to parse all of the email in that string, so I set:
p = r'[\w\.]+@[\w\.]+'
re.findall(p, str)
And the result was:
['[email protected]', '[email protected]', '[email protected]']
Apparently, the first and the second are duplicated. How do we prevent this from happening?
Upvotes: 2
Views: 3024
Reputation: 3785
You can remove duplicates using a set
. A set
is like an unordered list
which can't contain duplicates. In this case, you don't care about case, so making the results lowercase will let you properly check for duplicates.
import re
s = 'Please Contact Prof. Zheng Zhao: <a href="mailto:[email protected]">[email protected]</a> for details, or our HR: [email protected]'
p = r'[\w\.]+@[\w\.]+'
list(set(result.lower() for result in re.findall(p, s)))
Upvotes: 5