Jos
Jos

Reputation: 55

Beautiful Soup - getting tag contents

The html code I have (multiple of these entries) looks like this:

<p class="number-values">
    <span class="text">Count:</span>
    <span data-value="10000" name="nv">10,000</span>
    <span class="devider">#</span> <span class="text">Number:</span>
    <span data-value="500,000" name="nv">0.05</span>
</p>

Now, I'm looking to get the content of the data-value tags. What I've written so far is:

url = http://example.com
source = urllib.request.urlopen(url).read()
soup = bs.BeautifulSoup(source,"lxml")

contents = soup.find_all("p", class_="number-values")

for content in contents:
    print(content.string)

However it outputs this (including a lot of senseless returns I couldn't figure out):

Count:

10,000

#

Number:

0,05

I can't seem to find the right tag to extract, maybe I should regex the entire string?

Upvotes: 1

Views: 7132

Answers (2)

ᴀʀᴍᴀɴ
ᴀʀᴍᴀɴ

Reputation: 4528

It does make sense , because you gave him the class for p tag it returns all text of its child tags . if you want just 10,000 and0,05 you should search through span tags with have attribute name = "nv" :

for content in soup.find_all("span" , {"name" : "nv"}):
    print (content.text) # 10,000 0,05

Upvotes: 2

Pushkr
Pushkr

Reputation: 3619

Try

contents = soup.find_all("p", {"class":"number-values"})

Upvotes: 0

Related Questions