Reputation: 753
I'm working on using bs4 to pull information from listings on ebay to obtain details on products, I'm attempting to produce a result using this listing as an example, the code I'm feeling is most accurate is as below:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, 'html.parser')
attributes = page_soup.findAll("div",{'class':'itemAttr'})
attribute = attributes [0]
row = attribute.tr.contents
The idea being, pull the webpage, parse the appropriate div (itemattr), and attempt to pull content from here using the tr/td tags or combination there of. Not included above is my numerous variations of work this, but I can see I hit this roadblock of the parse producing a list (with one item) and navigation through this list is met with road blocks. I did look at directly parsing the table, but unfortunately they haven't given it a class. I'm wondering if there is any ideas on how to pull a table from a div tag, or perhaps create a new subset of html from parse (as opposed to a list?). Or tell me if I've gone insane and should go to bed.
Upvotes: 1
Views: 1998
Reputation: 12381
I think your current work makes a lot of sense, good job!
To move ahead, we can leverage the structure of the td
elements on the eBay page, and the fact that they come in two's with a attrLabels
class on the header to extract the specific data.
This gives you the data in the same order as it appears on the page:
tds = attribute.findAll("td")
ordered_data = []
for i in range(0, len(tds), 2):
if tds[i].get('class') == ['attrLabels']:
key = tds[i].text.strip().strip(":")
value = tds[i+1].span.text
ordered_data.append({ key: value })
And this gives you the same thing but in a dict with key-value pairs so that you can easily access each attribute:
tds = attribute.findAll("td")
searchable_data = {}
for i in range(0, len(tds), 2):
if tds[i].get('class') == ['attrLabels']:
key = tds[i].text.strip().strip(":")
value = tds[i+1].span.text
searchable_data[key] = value
Upvotes: 3