ArdsonW
ArdsonW

Reputation: 3

How to scrape values of an array under a "ul" tag using BeautifulSoup?

I need to get values from "ul" element but there is no "li" items in it. Instead it has tag with array values. Like below.

<div class ="family">
<ul class ="age">
<ll-per-person count ="[4, 36, 60]" extracount="[]"></ll-per-person>
</ul>
</div>

I want to retrieve the count values. This is the code I have tried in python

r = requests.get(**url**)
soup = BeautifulSoup(r.content, 'html5lib')
table = soup.find('div', attrs={'class': 'family'})
for ul in table.findAll('ul', attrs={'class': 'age'}):
    print(ul)
    for li in ul.findAll('ll-per-person'):
        print(li)
        for numbers in li.findAll(attrs = {"ll-per-person" : "count"}):
            print(numbers)

I'm getting output for "print(ul)" and "print(li)". But not "print(numbers)". Not getting any error too. I need to get the values of count which is an array. How to do that?

Upvotes: 0

Views: 525

Answers (3)

Tanish Sarmah
Tanish Sarmah

Reputation: 486

Since the "u" tag has <ll-per-person count="[4, 36, 60]" extracount="[]"></ll-per-person> as its second child (Use soup.u.contents to view the children) we can access it and get the value of count attribute.

from bs4 import BeautifulSoup as bs
html_doc = """
<div class ="family">
<ul class ="age">
<ll-per-person count ="[4, 36, 60]" extracount="[]"></ll-per-person>
</ul>
</div>"""
soup = bs(html_doc,'html.parser')
tag_ll =  soup.ul.contents[1]

print(tag_ll['count'])

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195553

To extract the numbers from <ll-per-person> tag you can use json module for example:

import json
from bs4 import BeautifulSoup

html_doc = """
<div class ="family">
<ul class ="age">
<ll-per-person count="[4, 36, 60]" extracount="[]"></ll-per-person>
</ul>
</div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for item in soup.select("ll-per-person"):
    lst = json.loads(item["count"])
    print("Numbers are:")
    for number in lst:
        print(number)

Prints:

Numbers are:
4
36
60

Upvotes: 0

imxitiz
imxitiz

Reputation: 3987

You can just do this because count is the attribute of ll-per-person and you can get attribute of element like this.

for li in ul.findAll('ll-per-person'):
    print(li["count"])

If it helps with your problem then don't forget to mark this as answer.

Upvotes: 1

Related Questions