How to search for specific text in span in beautifulsoup?

Question

I have the HTML as:


Engineering Mechanics


Engineering Mechanics

...

my code for getting span text is :

trs = soup.find_all('tr')
for tr in trs:
    spans = tr.find_all('span')
    if spans.id == "ContentPlaceHolder1_grd_reminder_Label***":
        print spans.string

In this line spans.id == "ContentPlaceHolder1_grd_reminder_Label***", I want to get all the ids having the same text at the beginning but different numbers at the last (like the above contents the number at last - 1_0). But my code is an error. How can I solve it?

alecxe · Accepted Answer

First of all, your current code does not work for multiple reasons:

the spans is actually a ResultSet object - a list of tags and it does not have an id attribute
even if spans would be a single Tag instance, the spans.id would not get you the id attribute - it would actually mean spans.find("id") which would result into None. To get the attribute value of a Tag, use it like a dictionary, e.g: span["id"]
you cannot do the partial match with == and * in the string

We can do better and solve it in a cleaner way anyway.

The easiest thing to do is to use the "starts with" CSS selector:

for elm in soup.select("span[id^=ContentPlaceHolder1_grd_reminder_Label]"):
    print(elm.get_text())

Or, if via find_all(), you can either use a filtering function:

for elm in soup.find_all("span", id=lambda value: value and value.startswith("ContentPlaceHolder1_grd_reminder_Label"):
    print(elm.get_text())

Or, a regular expression:

import re

for elm in soup.find_all("span", id=re.compile("^ContentPlaceHolder1_grd_reminder_Label")):
    print(elm.get_text())

where ^ denotes the beginning of a string.

How to search for specific text in span in beautifulsoup?

Answers (1)

Related Questions