Reputation: 33
Using BS4 to parse a website to extract some part numbers and details. I can find the class 'manufDetaiList' which contains the values I am trying to retrieve from the site. I am unable to then retrieve the actual values though from these fields: pdpProductBrandName - Stronghand Tools pdpProductSKUvalue -02139254 pdpProductMPN -MST327
I have read a number of BS4 starting tutorials, but I can not find something to help extract the values as required.
In [11]:page.find_all (class_= 'manufDetailList')
Out[11]: [<div class="manufDetailList">
<ul>
<li>Stronghand Tools</li>
<input name="pdpProductBrandName" type="hidden" value="Stronghand Tools"/>
<li>BW#:<span class="hobsondata">02139254</span></li>
<input name="pdpProductSKU" type="hidden" value="02139254"/>
<li>Mfr#:<span class="hobsondata">MST327</span></li>
<input name="pdpProductMPN" type="hidden" value="MST327"/>
<input name="categoryName" type="hidden" value="Tools - Hand, Measuring & Precision/Clamps – Magnetic/Corner – Pre Tooling"/>
<li>UNSPSC#:<span class="hobsondata">27112121</span></li>
</ul>
</div>]
Upvotes: 1
Views: 184
Reputation: 84475
You want the value
attribute and can match the required element(s) using the name
attribute
soup.select_one('[name="pdpProductBrandName"]')['value']
Same idea for each of the others.
You could add the parent class if required
soup.select_one('.manufDetailList [name="pdpProductBrandName"]')['value']
Read about css attribute selectors here. The []
represent attribute selector.
Upvotes: 1