Reputation: 11

Extracting data directly from HTML with BeautifulSoup

I have the following HTML data. I need to get just the "2" from it, using BeautifulSoup4:

<td rowspan="2" style="text-align: center; vertical-align: middle;">
    <small>3</small>
</td>

I tried:

k.find('rowspan')['style']

Which produced the exception:

Traceback (most recent call last): File "", line 1, in TypeError: list indices must be integers, not str

Is it possible to do it using BS4? Or shouls I use a different library to extract CSS directly?

Upvotes: 0

Answers (2)

Dan Lenski

Reputation: 79762

Why are you using find("rowspan")? You are not searching for a <rowspan> tag.

The find method searches for tags based on the tag name when a single string parameter is passed.

What you should be using is something like this, which means, "find the first <td> tag with attribute value rowspan="2", and return the value of its style attribute":

k.find('td', rowspan="2")['style']

See the "Kinds of filters" section of the docs for the various ways of specify which tags to search for.

Upvotes: 1

zverianskii

Reputation: 471

try this:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<td rowspan="2" style="text-align: center; vertical-align: middle;"><small>3</small></td>', 'html.parser')
print(soup.td['rowspan'])

Upvotes: 0

Extracting data directly from HTML with BeautifulSoup

Answers (2)

Related Questions