Reputation: 115
I'm trying to find all div's that contain the substring 'auction-results' then extract the class name. Here's an example:
<div class="auction-results high-bid has-price"></div>
I can find all the div's that contain 'auction-results' like so:
results = soup.select("div[class*=auction-results]")
type(results)
results
Out: [<div class="auction-results high-bid has-price">
<i class="icon"></i>
<span class="lot-price"> $700,000</span>
</div>]
Out: bs4.element.ResultSet
What I want is to store the entire class name 'auction-results high-bid has-price' in a pandas column like so:
class_text = ['auction-results high-bid has-price']
'auction-results high-bid has-price'
scraped_data = pd.DataFrame({'class_text': class_text})
scraped_data
class_text
0 auction-results high-bid has-price
I haven't found a solution yet so I hope someone can help me out, thanks!
Upvotes: 1
Views: 454
Reputation: 1938
See this example below. you can treat it as html document and using lxml to parse the full name value.
from lxml import html
results = '<div class="auction-results high-bid has-price"><i class="icon"></i><span class="lot-price">$700,000</span></div>'
tree = html.fromstring(results)
name = tree.xpath("//div[contains(@class,'auction-results')]/@class")
print(name)
It prints the full class name
['auction-results high-bid has-price']
Upvotes: 1
Reputation: 24930
Try it this way:
columns = ['class_text']
rows = []
for result in results:
rows.append(' '.join(result['class']))
scraped_data = pd.DataFrame([rows],columns=columns)
scraped_data
Output:
class_text
0 auction-results high-bid has-price
Upvotes: 1