Mark Clements
Mark Clements

Reputation: 465

Beautifulsoup Unable to Find Classes with Hyphens in Their Name

I am using BeautifulSoup4 on a MacOSX running Python 2.7.8. I am having difficulty extracting information from the following html code

 <tbody tabindex="0" class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650">
      <tr id="yui-rec0" class="yui-dt-first yui-dt-even">
           <td headers="yui-dt0-th-rank" class="rank yui-dt0-col-rank"></td>
           </tr>
      <tr id="yui-rec1" class="yui-dt-odd">...</tr>
      <tr id="yui-rec2" class="yui-dt-even">...</tr>
 </tbody>

I can't seem to grab the table or any of it's contents because BS and/or python doesn't seem to recognize values with hyphens. So the usual code, something like

 Table = soup.find('tbody',{'class':'yui-dt-data'})

or

 Row2 = Table.find('tr',{'id':'yui-rec2'})

just returns an empty object (not NONE, simply empty). I'm not new to BS4 or Python and I've extracted information from this site before, but the class names are different now than when I previously did it. Now everything has hyphens. Is there any way to get Python to recognize the hyphen or a workaround?

I need to have my code be general so that I can run it across numerous pages that all have the same class name. Unfortunately, the id attribute in <tbody> is unique to that particular table, so I can't use that to identify this table across webpages.

Any help would be appreciated. Thanks in advance.

Upvotes: 5

Views: 5050

Answers (3)

QHarr
QHarr

Reputation: 84455

Just use select. bs4 4.7.1

import requests
from bs4 import BeautifulSoup as bs

html = '''
<tbody tabindex="0" class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650">
      <tr id="yui-rec0" class="yui-dt-first yui-dt-even">
           <td headers="yui-dt0-th-rank" class="rank yui-dt0-col-rank"></td>
           </tr>
      <tr id="yui-rec1" class="yui-dt-odd">...</tr>
      <tr id="yui-rec2" class="yui-dt-even">...</tr>
 </tbody>
 '''
soup = bs(html, 'lxml')
soup.select('.yui-dt-data')

Upvotes: 1

孟庆良
孟庆良

Reputation: 61

For people trying to find a solution to find a tag with hyphen in its attributes, there is an answer in the document https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-keyword-arguments

This segment of code will cause error

data_soup = BeautifulSoup('<div data-foo="value">foo!</div>')
data_soup.find_all(data-foo="value")
# SyntaxError: keyword can't be an expression

you should do this

data_soup.find_all(attrs={"data-foo": "value"})
# [<div data-foo="value">foo!</div>]

Upvotes: 6

user3391564
user3391564

Reputation: 576

The following code:

from bs4 import BeautifulSoup

htmlstring = """ <tbody tabindex="0" class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650">
      <tr id="yui-rec0" class="yui-dt-first yui-dt-even">
      <tr id="yui-rec1" class="yui-dt-odd">
      <tr id="yui-rec2" class="yui-dt-even">"""


soup = BeautifulSoup(htmlstring)
Table = soup.find('tbody', attrs={'class': 'yui-dt-data'}) 
print("Table:\n")
print(Table)
tr = Table.find('tr', attrs={'class': 'yui-dt-odd'}) 
print("tr:\n")
print(tr)

outputs:

Table:

<tbody class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650" tabindex="0">
<tr class="yui-dt-first yui-dt-even" id="yui-rec0">
<tr class="yui-dt-odd" id="yui-rec1">
<tr class="yui-dt-even" id="yui-rec2"></tr></tr></tr></tbody>
tr:

<tr class="yui-dt-odd" id="yui-rec1">
<tr class="yui-dt-even" id="yui-rec2"></tr></tr>

Even though the html you supplied isn't by itself valid, it seems that BS is making a guess about how it should be, because soup.prettify() yields

<tbody class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650" tabindex="0">
 <tr class="yui-dt-first yui-dt-even" id="yui-rec0">
  <tr class="yui-dt-odd" id="yui-rec1">
   <tr class="yui-dt-even" id="yui-rec2">
   </tr>
  </tr>
 </tr>
</tbody>

Though I'm guessing those tr's aren't supposed to be nested.

Could you try running that exact code and seeing what the output is?

Upvotes: 6

Related Questions