Reputation: 91
I am trying to scrape from this URL that returns a JSON file.
The page loads in less than a second on my browser but takes about 10 seconds using requests. Any suggestions on why it takes so long and how to change that?
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
URL = 'https://www.lowes.com/IntegrationServices/resources/storeLocator/json/v2_0/stores?langId=-1&storeId=10702&catalogId=10051&place=10001&count=25'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
page = requests.get(URL, headers=headers )
soup = BeautifulSoup(page.content, 'html.parser')
site_json = json.loads(soup.text)
df = pd.DataFrame.from_dict(site_json)
first_row = pd.Series(df.iloc[0]['Location'])
print(first_row)
(I am also aware that I am probably doing extra steps when converting it to a database, I am used to scraping from an HTML... and this still works)
Upvotes: 2
Views: 1608
Reputation: 546
For me, changing the user-agent seems to fix the issue, e.g.:
headers = {
"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"
}
Perhaps the issue has something to do with Lowes' API server denying or delaying responses to unrecognized/uncommon user-agents. A list of current Chrome user-agent values can be found here.
Upvotes: 2