Sophia
Sophia

Reputation: 33

How to get <li> tag information (BeautifulSoup Webscraping)?

I am scraping the information from this page:
https://lawyers.justia.com/lawyer/michael-paul-ehline-85006 . I am trying to scrape all the information in under the fees section. What I want is the following information: Free Consultation Yes Credit Cards Accepted Visa, Mastercard, American Express Contingent Fees In personal injury cases only. Rates, Retainers and Additional Information Rates vary on a case by case basis.

This is what I have tried:

for thing in soup.findAll('ul', attrs={"class": "has-no-list-styles"}):
   ul=thing.find('<li>')
   print(ul)

but the output is:

<li>Intellectual Property</li>
<li>Copyright Law</li>
<li><strong>English</strong></li>

Thank you in advance.

UPDATE: I found a solution but it gives me an infinite loop, any suggestions?

for o in soup.findAll('div', attrs={"class": "block-wrapper"}):     
    for tag in soup.findAll('div', attrs={"class": "block-wrapper"}):
        if tag.string:
            tag.string.replace_with("")
        for de in o.findAll("li"):
            if de != []:
                de=remove_tags(str(de))
                print (de)

Upvotes: 1

Views: 61

Answers (2)

Abe
Abe

Reputation: 441

Try this soup. It was inspired by dabinsous answer. All it does is look for the icon that he detailed, then go to its parent's next sibling, and from there grab that siblings text.

import requests 
from bs4 import BeautifulSoup 

URL = "https://lawyers.justia.com/lawyer/michael-paul-ehline-85006"
r = requests.get(URL) 
soup = BeautifulSoup(r.content, 'html.parser')
uls = soup.find('span', attrs={"class": "jicon -large jicon-fee"})
print(uls.parent.nextSibling.text)

Adjust your scraping to meet that, and see if that helps!

Upvotes: 0

dabingsou
dabingsou

Reputation: 2469

Try this.

from simplified_scrapy import SimplifiedDoc,req
html = req.get('https://lawyers.justia.com/lawyer/michael-paul-ehline-85006')
doc = SimplifiedDoc(html)
ul = doc.getElement('ul',attr='class',value='has-no-list-styles',start='class="jicon -large jicon-fee"') # Use class="jicon -large jicon-fee" to locate
print (ul.text)

Result:

Free ConsultationYesCredit Cards AcceptedVisa, Mastercard, American ExpressContingent FeesIn personal injury cases only.Rates, Retainers and Additional InformationRates vary on a case by case basis.

Upvotes: 1

Related Questions