Reputation: 41
I'm pretty new to Python, but I've started experimenting with web scraping with BS4 with some success, but I'm now on a new personal project where I'm indexing AutoTrader from an HTML file.
So far I'm able to scrape all the values I need, but one. I've searched and can't find a solution
I need to extract the province "BC" from data-payment-province="BC"
from the below code
<div class="asLowAs payment-tag-disclaimer" data-payment-tag-adid="66736200" data-payment-province="BC" data-payment-tag-isnew="False" style="display: none" data-toggle="popover">
I've used location = soup.find_all('div', class_='data-payment-province')
but it returns []
Idk, I'm probably being dumb and missing something obvious but I'm honestly so stumped.
Also, I should probably ask this in another question. But does anyone know how to only get the values as output instead of the HTML and Values?
e.x.
Current:
itemOffered = soup.find_all("span", itemprop="itemOffered")
OUTPUT:
</span>, <span itemprop=""itemOffered"">
2019 Hyundai Elantra GT | Bluetooth | Backup Camera | Heated Seats | Blind
Desired OUTPUT:
2019 Hyundai Elantra GT
Upvotes: 0
Views: 66
Reputation: 719
A much cleaner approach would be this:
divs= soup.find_all('div')
for div in divs:
if div.has_attrs('data-payment-province'):
print(div['data-payment-province'])
And to get the text of elements you can use this:
elements = soup.find_all(['span','element1','element2'])
for element in elements:
fulltextofelement = element.find(text=True, recursive=True)
onlyparenttext = element.find(text=True, recursive=False)
Upvotes: 0
Reputation: 524
Give this a shot for your first problem:
import requests
from bs4 import BeautifulSoup
import re
.....
province_re = re.compile(r'[A-Z]{2}')
location = soup.find_all('div', {'data-payment-province': province_re})
for loc in location:
print(loc.attrs['data-payment-province'])
Upvotes: 1