Roman
Roman

Reputation: 3241

Unable to grab a text from soup

I have an HTML as follows:

<table class="stocksTable" summary="株価詳細">
<tr>
<th class="symbol"><h1>(株)みずほフィナンシャルグループ</h1></th>
<td class="stoksPrice realTimChange">
<div class="realTimChangeMod">
</div>
</td>
td class="stoksPrice">191.1</td>
<td class="change"><span class="yjSt">前日比</span><span class="icoUpGreen yjMSt">+2.5(+1.33%)</span></td>
</tr>
</table>

I tried to extract 191.1 from a line containing td class="stoksPrice">191.1</td>.

soup = BeautifulSoup(html)
res = soup.find_all('stoksPrice')
print (res)

But result is []. How to find it guys?

Upvotes: 1

Views: 59

Answers (3)

Keyur Potdar
Keyur Potdar

Reputation: 7238

Since there are multiple tags having the same class, you can use CSS selectors to get an exact match.

html = '''<table class="stocksTable" summary="株価詳細">
<tr>
<th class="symbol"><h1>(株)みずほフィナンシャルグループ</h1></th>
<td class="stoksPrice realTimChange">
<div class="realTimChangeMod">
</div>
</td>
<td class="stoksPrice">191.1</td>
<td class="change"><span class="yjSt">前日比</span><span class="icoUpGreen yjMSt">+2.5(+1.33%)</span></td>
</tr>
</table>'''

soup = BeautifulSoup(html, 'lxml')
print(soup.select_one('td[class="stoksPrice"]').text)
# 191.1

Or, you could use lambda and find to get the same.

print(soup.find(lambda t: t.name == 'td' and t['class'] == ['stoksPrice']).text)
# 191.1

Note: BeautifulSoup converts multi-valued class attributes in lists. So, the classes of the two td tags look like ['stoksPrice'] and ['stoksPrice', 'realTimChange'].

Upvotes: 2

johnashu
johnashu

Reputation: 2211

Here is one way to do it using findAll.

Because all the previous stoksPrice are empty the only one that remains is the one with the price..

You can put in a check using try/except clause to check if it is a floating point number.

If it is not it will continue iterating and if it is it will return it.

res = soup.findAll("td", {"class": "stoksPrice"})
for r in res:
    try:
        t = float(r.text)
        print(t)
    except:
        pass

191.1

Upvotes: 0

AbdealiLoKo
AbdealiLoKo

Reputation: 3337

There seem to be two issues:

First is that your usage of find_all is invalid. The current way you're searching for a tagname called stoksPrice which is wrong ad your tags are table, tr, td, div, span. You need to change that to:

>>> res = soup.find_all(class_='stoksPrice')

to search for tags with that class.

Second, your HTML is malformed. The list with stoksPrice is:

</td>
td class="stoksPrice">191.1</td>

it should have been:

</td>
<td class)="stoksPrice">191.1</td>

(Note that < before the td) Not sure if that was a copy error into Stack Overflow or the HTML is originally malformed but that is not going to be easy to parse ...

Upvotes: 2

Related Questions