Dawn17
Dawn17

Reputation: 8297

Using BeautifulSoup to retrieve information based on the attribute

<resultsummary>
    <resultticker category="executed">
        <count>12</count>
        <percentage>1.0</percentage>
        <id xlink:href="...">id_num</id>
        <id xlink:href="...">id_num</id>
    </resultticker>
    <resultticker category="done">
        <count>16</count>
        <percentage>0.6</percentage>
        <id xlink:href="...">id_num</id>
        <id xlink:href="...">id_num</id>
    </resultticker>
</resultsummary>

I am using BeatifulSoup4 and I get the response above using the command soup.find("resultsummary"). What I want to do is, I want to retrieve the count that is in the resultticker tag and categorized them by the category attribute.

So, I would want to get, { executed: 12, done: 16 }.

I tried to do soup.find("resultsummary").find('resultticker')['category'] something like this, but it just gives me the name of the attribute executed but not the information inside the tag.

Any help? Thanks in advance.

Upvotes: 1

Views: 60

Answers (2)

Pedro Lobito
Pedro Lobito

Reputation: 98921

You can use something like:

final = {}
for rt in soup.findAll('resultticker'):
    final[rt["category"]] = rt.count.text

{'executed': '12', 'done': '16'}

Live Python Demo

Upvotes: 0

Rakesh
Rakesh

Reputation: 82765

Use:

from bs4 import BeautifulSoup

html = """<div>
    <resultsummary>
    <resultticker category="executed">
        <count>12</count>
        <percentage>1.0</percentage>
        <id xlink:href="...">id_num</id>
        <id xlink:href="...">id_num</id>
    </resultticker>
    <resultticker category="done">
        <count>16</count>
        <percentage>0.6</percentage>
        <id xlink:href="...">id_num</id>
        <id xlink:href="...">id_num</id>
    </resultticker>
</resultsummary>
</div>"""

result = {}
soup = BeautifulSoup(html, "html.parser")
for resultticker in soup.find("resultsummary").find_all('resultticker'):  #iterate each resultticker 
    result[resultticker['category']] = resultticker.count.text  #Key=category & Value=count
print(result)

Output:

{'executed': '12', 'done': '16'}

Upvotes: 2

Related Questions