Reputation: 43
I have the HTML as:
<tr>
<span id="ContentPlaceHolder1_grd_reminder_Label1_0">Engineering Mechanics</span>
</tr>
<tr>
<span id="ContentPlaceHolder1_grd_reminder_Label1_2">Engineering Mechanics</span>
</tr>
...
my code for getting span text is :
trs = soup.find_all('tr')
for tr in trs:
spans = tr.find_all('span')
if spans.id == "ContentPlaceHolder1_grd_reminder_Label***":
print spans.string
In this line spans.id == "ContentPlaceHolder1_grd_reminder_Label***"
, I want to get all the ids having the same text at the beginning but different numbers at the last (like the above contents the number at last - 1_0
). But my code is an error. How can I solve it?
Upvotes: 0
Views: 1741
Reputation: 474191
First of all, your current code does not work for multiple reasons:
spans
is actually a ResultSet
object - a list of tags and it does not have an id
attributespans
would be a single Tag
instance, the spans.id
would not get you the id
attribute - it would actually mean spans.find("id")
which would result into None
. To get the attribute value of a Tag
, use it like a dictionary, e.g: span["id"]
==
and *
in the stringWe can do better and solve it in a cleaner way anyway.
The easiest thing to do is to use the "starts with" CSS selector:
for elm in soup.select("span[id^=ContentPlaceHolder1_grd_reminder_Label]"):
print(elm.get_text())
Or, if via find_all()
, you can either use a filtering function:
for elm in soup.find_all("span", id=lambda value: value and value.startswith("ContentPlaceHolder1_grd_reminder_Label"):
print(elm.get_text())
Or, a regular expression:
import re
for elm in soup.find_all("span", id=re.compile("^ContentPlaceHolder1_grd_reminder_Label")):
print(elm.get_text())
where ^
denotes the beginning of a string.
Upvotes: 1