Reputation: 39
I am currently extracting the following from a website using BeautifulSoup. But am struggling to print extract the data I need.
I am looking to extract for each list entry:
The data-qty value and the href="#">4 value. So for example in the first list entry I am trying to extract href = 4 and data-qty = 1.000.
The code I am currently using is listed under the data.
<div class="content size-options size_us-options" data-sizegroup="size_us" style="display:none">
<ul class="sizes small-block-grid-4">
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="24" data-price="0" data-qty="1.0000" data-qtymad="0.0000" data-qtybcn="1.0000" data-oblocators="BBAI-0B-05-05" href="#">4</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="172" data-price="0" data-qty="4.0000" data-qtymad="0.0000" data-qtybcn="2.0000" data-oblocators="BBAI-0B-05-05" href="#">4.5</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="22" data-price="0" data-qty="10.0000" data-qtymad="0.0000" data-qtybcn="2.0000" data-oblocators="BBAI-0B-07-05" href="#">5</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="160" data-price="0" data-qty="10.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-07-05" href="#">5.5</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="20" data-price="0" data-qty="9.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-05-05" href="#">6</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="165" data-price="0" data-qty="11.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-05-05" href="#">6.5</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="18" data-price="0" data-qty="28.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-05-05" href="#">7</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="110" data-price="0" data-qty="41.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-05-05" href="#">7.5</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="16" data-price="0" data-qty="53.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-05-05" href="#">8</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="121" data-price="0" data-qty="68.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-06-02;BBAI-0B-05-05" href="#">8.5</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="14" data-price="0" data-qty="85.0000" data-qtymad="0.0000" data-qtybcn="4.0000" data-oblocators="BBAI-0B-07-05" href="#">9</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="114" data-price="0" data-qty="64.0000" data-qtymad="0.0000" data-qtybcn="4.0000" data-oblocators="BBAI-0B-07-05" href="#">9.5</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="12" data-price="0" data-qty="71.0000" data-qtymad="0.0000" data-qtybcn="4.0000" data-oblocators="BBAI-0B-07-05" href="#">10</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="105" data-price="0" data-qty="59.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-07-05" href="#">10.5</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="10" data-price="0" data-qty="61.0000" data-qtymad="0.0000" data-qtybcn="3.0000" data-oblocators="BBAI-0B-07-05" href="#">11</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="117" data-price="0" data-qty="39.0000" data-qtymad="0.0000" data-qtybcn="2.0000" data-oblocators="BBAI-0B-07-05" href="#">11.5</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="8" data-price="0" data-qty="39.0000" data-qtymad="0.0000" data-qtybcn="2.0000" data-oblocators="BBAI-0B-07-05" href="#">12</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="202" data-price="0" data-qty="25.0000" data-qtymad="0.0000" data-qtybcn="0.0000" data-oblocators="" href="#">12.5</a>
</li>
<li>
<a rel="nofollow" class="size-button available" data-optionIndex="126" data-price="0" data-qty="26.0000" data-qtymad="0.0000" data-qtybcn="0.0000" data-oblocators="" href="#">13</a>
</li>
</ul>
</div>
This is the code that I am currently using, I am struggling to extract and print the data I need and will be thankful for any help!
soup = BeautifulSoup(response.content, 'html.parser')
ukattributes = soup.find('div', {'class':'content size-options
size_uk-options'})
print ukattributes
sizes = ukattributes.findAll('li')
print sizes
for size in sizes:
response = s.get(size.find('a')['href'])
soup = BeautifulSoup(response.content, 'html.parser')
print size
Please let me know if you can help me with this as I have been trying for a while now! Thanks again
Upvotes: 0
Views: 1876
Reputation: 9420
You cant make a GET request on a URL # as this is not sent to the server it is probably used by JavaScript on the page or just links to the same page. See my answer to Pagination giving the first page in every iteration for more details. So:
response = s.get(size.find('a')['href'])
Will not work as you expected. To get the data you requested try:
soup = BeautifulSoup(response.content, 'html.parser')
ukattributes = soup.find('div', {'class':'content size-options size_us-options'})
print (ukattributes)
sizes = ukattributes.findAll('li')
print (sizes)
for size in sizes:
href = size.find('a',href=True)
print (href.text)
print (href["data-qty"])
Outputs:
4
1.0000
4.5
4.0000
5
10.0000
5.5
10.0000
Upvotes: 1
Reputation: 15376
You can use a simple list comprehension to select the data you need.
ukattributes = soup.find('div', {'class':'content size-options size_us-options'})
data = [ [a.text, a.get('data-qty')] for a in ukattributes.find_all('a') ]
Upvotes: 1