Reputation: 25
I am new to Python and am trying to make a series of programs to text my phone the performance of stock market indices. I have a few programs working with limited functionality, and I believe they'd be improved if the data could be scraped from google, which I have not been able to do so far. The values I am trying to pull are at the top of the results page every time, almost in a table of some sort. At the bottom, I have a snip attached with the value I am trying to scrape.
Here is the portion of the code I have for the web scraping portion currently. I am using beautiful soup and requests.
import bs4
import requests
res = requests.get('https://www.google.com/search?safe=active&sxsrf=ALeKk00d7WrRTMvmhypG20E5MOEWpRwKlw%3A1601591747498&ei=w1l2X52DHoPatAXElILQDg&q=nasdaq+composite&oq=nasd&gs_lcp=CgZwc3ktYWIQAxgAMgwIIxAnEJ0CEEYQ-gEyBAgjECcyBAgjECcyCggAELEDEIMBEEMyCggAELEDEIMBEEMyCAgAELEDEIMBMgcIABCxAxBDMgcIABCxAxBDMgcIABCxAxBDMgoIABCxAxCDARBDOgQIABBHOgUIABCxAzoHCCMQ6gIQJzoHCCMQJxCdAjoECAAQQ1DszQ5Y9tkOYPjkDmgBcAJ4BYABqQOIAZ4NkgEJMS43LjEuMC4xmAEAoAEBqgEHZ3dzLXdperABCsgBCMABAQ&sclient=psy-ab')
type(res)
soup = bs4.BeautifulSoup(res.text,'lxml')
type(soup)
Current_Level = soup.find(class_='IsqQVc_NprOob_XcVN5d')
print (Current_Level)
The link leads to page if you googled "Nasdaq composite". The class I am using for soup.find()
is aligned with the value when you right click - inspect the value on the web page.
Open to all solutions to get this value. Thanks for all of your help, greatly appreciated!
Image of the value I'm trying to scrape
Upvotes: 2
Views: 3635
Reputation: 1734
Make sure you're using user-agent
otherwise Google will block your request eventually thus you'll receive a completely different HTML with some sort of an error that will contain different selectors and elements. Check what is your user-agent
.
Update 2022.03.10
:
If this script was throwing an error to past users who saw this answer, it's because of two reasons:
user-agent
. Because of it, Google or other website might assume that if request being made with an old use-agent
, it's likely a script or a bot that sends a request.beautifulsoup
.Code and example in the online IDE:
import requests, lxml
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
}
params = {
"q": "Nasdaq composite",
"hl": "en"
}
soup = BeautifulSoup(requests.get('https://www.google.com/search', headers=headers, params=params).text, 'lxml')
print(soup.select_one('.wT3VGc').text)
# 13,089.08
Upvotes: -2
Reputation: 20118
Try using the class name BNeawe iBp4i AP7Wnd
:
import requests
from bs4 import BeautifulSoup
res = requests.get(
"https://www.google.com/search?safe=active&sxsrf=ALeKk00d7WrRTMvmhypG20E5MOEWpRwKlw%3A1601591747498&ei=w1l2X52DHoPatAXElILQDg&q=nasdaq+composite&oq=nasd&gs_lcp=CgZwc3ktYWIQAxgAMgwIIxAnEJ0CEEYQ-gEyBAgjECcyBAgjECcyCggAELEDEIMBEEMyCggAELEDEIMBEEMyCAgAELEDEIMBMgcIABCxAxBDMgcIABCxAxBDMgcIABCxAxBDMgoIABCxAxCDARBDOgQIABBHOgUIABCxAzoHCCMQ6gIQJzoHCCMQJxCdAjoECAAQQ1DszQ5Y9tkOYPjkDmgBcAJ4BYABqQOIAZ4NkgEJMS43LjEuMC4xmAEAoAEBqgEHZ3dzLXdperABCsgBCMABAQ&sclient=psy-ab"
)
soup = BeautifulSoup(res.text, "lxml")
# Using `.split()` to remove `+159.00 (1.42%)` from the output
Current_Level = soup.find(class_="BNeawe iBp4i AP7Wnd").text.split('+')[0]
print(Current_Level)
Output:
+257.47
Edit:
If you call soup.prettify()
you see that the data is under the class BNeawe iBp4i AP7Wnd
:
soup = BeautifulSoup(res.text, "lxml")
print(soup.prettify())
...
...
<div>
<div>
<div>
<div class="kCrYT">
<div>
<div>
<div>
<div class="BNeawe iBp4i AP7Wnd">
<div>
<div class="BNeawe iBp4i AP7Wnd">
11,332.49
<span class="rQMQod AWuZUe">
+257.47 (2.32%)
</span>
</div>
</div>
</div>
</div>
</div>
...
...
Upvotes: 1
Reputation: 101
Here is how you can get the value.
from bs4 import BeautifulSoup as soup
import requests
url_to_scrape = "https://www.google.com/search?q=nasdaq+composite&oq=nasdaq+composite&aqs=chrome.0.0l8.5126j1j4&sourceid=chrome&ie=UTF-8"
try:
client_page = requests.get(url_to_scrape)
except:
print("Request aborted due to unknown reason!")
page_html = client_page.text
client_page.close()
page_soup = soup(page_html,"html.parser")
nasdaqValue = page_soup.findAll("div",{"class":"BNeawe iBp4i AP7Wnd"})
print(nasdaqValue[0].text)
Upvotes: 2