Omitting Numbers with regex

Question

This is, I fear, frighteningly simple, but I can't make it work (and I can't find the answer through a search). I am scraping a website for all words in italics (the ones I want are in groups of two words--they are binomial scientific names), but I don't want any numbers returned.

The regex I used : (.+?)

worked great but it pulled the numbers. I thought using \D would work, but it didn't. What am I doing wrong?

hwnd · Accepted Answer

Yes, I basically want to strip integers from any string inside the tags.

Python's re.findall looping through your matches replacing number characters should work for you.

pattern = re.compile(r'(?<=).*?(?=)')

for names in re.findall(pattern, htmltext):
    print re.sub(r'[0-9]', '', names)

To find the matches that do not contain numbers:

matches = re.findall(r'(?<=)[^0-9]*(?=)', htmltext)
print matches

Omitting Numbers with regex

Answers (2)

Related Questions