Reputation: 5152
I m using python requests to search the following site: https://www.investing.com/ for the terms "Durable Goods Orders US"
I check in the "Network" tab of the inspect panel, and it seems it is simply done with the following form: 'quotes_search_text':'Durable Goods Orders US'
So I tried with python:
URL = 'https://www.investing.com/'
data = {'quotes_search_text':'Durable Goods Orders US'}
resp = requests.post(URL, data=data, headers={ 'User-Agent': 'Mozilla/5.0', 'X-Requested-With': 'XMLHttpRequest'})
However this doesnt return the result that i can see while doing it manually. All the search results should have "gs-title" as a class attribute (as per the page inspection) but when I do:
soup = BeautifulSoup(resp.text, 'html.parser')
soup.select(".gs-title")
I see no results... Is there some aspect of POST request that I am not taking into account? (im a complete noob here)
Upvotes: 0
Views: 373
Reputation: 1706
After going over this in detail in the chat, there are many changes. In order to retrieve the information your looking for, you need to run the JS that's being run on their end. You can change the query
variable to whatever you want.
import requests
import json
from urllib.parse import quote_plus
URL = 'https://www.googleapis.com/customsearch/v1element'
query = 'Durable Goods Orders US'
query_formatted = quote_plus(query)
data = {
'key':'AIzaSyCVAXiUzRYsML1Pv6RwSG1gunmMikTzQqY',
'num':10,
'hl':'en',
'prettyPrint':'true',
'source':'gcsc',
'gss':'.com',
'cx':'015447872197439536574:fy9sb1kxnp8',
'q':query_formatted,
'googlehost':'www.google.com'
}
headers = {
'User-Agent':'Mozilla/5.0',
'Referer':'https://www.investing.com/search?q=' + query_formatted,
}
resp = requests.get(URL, params=data, headers=headers)
j = json.loads(resp.text)
# print(resp.text)
for r in j['results']:
print(r['title'], r['url'])
Upvotes: 1