Reputation: 19
Goal
Extract the business hours and its closed status from the Google Search results.
Screenshot with the highlighted working hours and closed status (example URL):
Screenshot with the highlighted working in the popup (example URL):
Problem
soup.find()
with the specific selector returns None
.
Description
I am trying to create a voice-activated AI similar to Google Home or Alexa that I can pair up with something cool. Currently, I'm trying to use data from the Google knowledge panel for specific search queries.
Code
def service(self, business):
url = requests.get("https://www.google.com/search?q={}+hours".format(business))
outputs = []
if url.status_code == 200:
soup = bs4.BeautifulSoup(url.text, "lxml")
# span class below is the class that contains the text that contains the hours shown for that day or just displays closed
string = soup.find("span", attrs={"class": "TLou0b JjSWRd"})
print(string)
# returns None
if url.status_code == 404:
print("Error")
return "Error 404"
How to extract the working hours and the closed status of the business?
PS. I'm on a Raspberry Pi 4. I don't want to use Selenium and its drivers. But I'm open to suggestions.
Upvotes: 1
Views: 946
Reputation: 1414
Selector for the business hours: [data-attrid='kc:/location/location:hours'] table tr
.
.TLou0b.JjSWRd
is a selector for the Google Answer Box.
From what I understand, you're looking for the business hours from the Google Knowledge Panel.
Code to extract business hours:
hours_wrapper_node = soup.select_one("[data-attrid='kc:/location/location:hours']")
if hours_wrapper_node is None:
logger.info("Business hours node is not found")
return
business_hours = {"open_closed_state": "", "hours": []}
business_hours["open_closed_state"] = hours_wrapper_node.select_one(
".JjSWRd span span span"
).text.strip()
location_hours_rows_nodes = hours_wrapper_node.select("table tr")
for location_hours_rows_node in location_hours_rows_nodes:
[day_of_week, hours] = [
td.text.strip() for td in location_hours_rows_node.select("td")
]
business_hours["hours"].append(
{"day_of_week": day_of_week, "business_hours": hours}
)
Output:
{
"hours": [
{"business_hours": "5:30–10PM", "day_of_week": "Wednesday"},
{"business_hours": "5:30–10PM", "day_of_week": "Thursday"},
{"business_hours": "5:30–11PM", "day_of_week": "Friday"},
{"business_hours": "5:30–11PM", "day_of_week": "Saturday"},
{"business_hours": "5:30–10PM", "day_of_week": "Sunday"},
{"business_hours": "Closed", "day_of_week": "Monday"},
{"business_hours": "5:30–10PM", "day_of_week": "Tuesday"},
],
"open_closed_state": "Closed",
}
Upvotes: 1