Sharyar Vohra
Sharyar Vohra

Reputation: 306

Scraping Specific Child Elements

I want to scrape such that I need two list

ListA = ["Driver Convenience","Exterior Features"]

ListB = ["2 key fob;Collision mitigation braking system;","Body coloured plastic front bumper;Boulder grey exterior door handle;Boulder grey exterior door mirrorn;"]

ListA will contain text within h4 tags and ListB will contain text within li tags until next h4 tag is found .

Here Is a Sample HTML Code :

<ul class="c-list-table">   
    <h4 class="c-list-table__section-heading">Driver Convenience</h4>
<li class="c-list-table__item" rel="2-key-fob"><span class="c-list-table__title"> 2 key fob </span</li>
<li class="c-list-table__item" rel="collision-mitigation-braking-system">Collision mitigation braking system</li>
    <h4 class="c-list-table__section-heading">Exterior Features</h4>
<li class="c-list-table__item" rel="body-coloured-plastic-front-bumper">Body coloured plastic front bumper</li>
<li class="c-list-table__item" rel="boulder-grey-exterior-door-handle">Boulder grey exterior door handle</li>
<li class="c-list-table__item" rel="boulder-grey-exterior-door-mirror">Boulder grey exterior door mirrorn</li>
</ul>

The HTML is same as this one :) Tried many things but couldn't help myself

Upvotes: 1

Views: 95

Answers (1)

KunduK
KunduK

Reputation: 33384

Use find_next_siblings('li') to find the li tags after h4 and then verify the text of previous_sibling('h4') not match with the text then add in to list.

from bs4 import BeautifulSoup
data='''     
<ul class="c-list-table">   
<h4 class="c-list-table__section-heading">Driver Convenience</h4>
<li class="c-list-table__item" rel="2-key-fob"><span class="c-list-table__title"> 2 key fob </span</li>
<li class="c-list-table__item" rel="collision-mitigation-braking-system">Collision mitigation braking system</li>
<h4 class="c-list-table__section-heading">Exterior Features</h4>
<li class="c-list-table__item" rel="body-coloured-plastic-front-bumper">Body coloured plastic front bumper</li>
<li class="c-list-table__item" rel="boulder-grey-exterior-door-handle">Boulder grey exterior door handle</li>
<li class="c-list-table__item" rel="boulder-grey-exterior-door-mirror">Boulder grey exterior door mirrorn</li>
</ul>'''

ListA =[]
ListB =[]
soup=BeautifulSoup(data,'lxml')
for item in soup.find_all('h4'):
    lifinal=""
    ListA.append(item.text)
    nextlis=item.find_next_siblings('li')
    for li in nextlis:
        if li.find_previous_sibling('h4').text in item.text:
            lifinal=lifinal+li.text.strip()+";"
    ListB.append(lifinal)

print(ListA)
print(ListB)

Output:

['Driver Convenience', 'Exterior Features']
['2 key fob;Collision mitigation braking system;', 'Body coloured plastic front bumper;Boulder grey exterior door handle;Boulder grey exterior door mirrorn;']

Upvotes: 1

Related Questions