null
null

Reputation: 1217

How to remove duplicated results of regular expression (re) in Python

There is a string:

str = 'Please Contact Prof. Zheng Zhao: <a href="mailto:[email protected]">[email protected]</a> for details, or our HR: [email protected]'

I wanted to parse all of the email in that string, so I set:

p = r'[\w\.]+@[\w\.]+'
re.findall(p, str)

And the result was:

['[email protected]', '[email protected]', '[email protected]']

Apparently, the first and the second are duplicated. How do we prevent this from happening?

Upvotes: 2

Views: 3024

Answers (1)

Jeremy McGibbon
Jeremy McGibbon

Reputation: 3785

You can remove duplicates using a set. A set is like an unordered list which can't contain duplicates. In this case, you don't care about case, so making the results lowercase will let you properly check for duplicates.

import re

s = 'Please Contact Prof. Zheng Zhao: <a href="mailto:[email protected]">[email protected]</a> for details, or our HR: [email protected]'

p = r'[\w\.]+@[\w\.]+'
list(set(result.lower() for result in re.findall(p, s)))

Upvotes: 5

Related Questions