JG71273
JG71273

Reputation: 25

Web Scraping Values on Google Search Results Page. Python, BeautifulSoup, Requests

I am new to Python and am trying to make a series of programs to text my phone the performance of stock market indices. I have a few programs working with limited functionality, and I believe they'd be improved if the data could be scraped from google, which I have not been able to do so far. The values I am trying to pull are at the top of the results page every time, almost in a table of some sort. At the bottom, I have a snip attached with the value I am trying to scrape.

Here is the portion of the code I have for the web scraping portion currently. I am using beautiful soup and requests.

import bs4
import requests

res = requests.get('https://www.google.com/search?safe=active&sxsrf=ALeKk00d7WrRTMvmhypG20E5MOEWpRwKlw%3A1601591747498&ei=w1l2X52DHoPatAXElILQDg&q=nasdaq+composite&oq=nasd&gs_lcp=CgZwc3ktYWIQAxgAMgwIIxAnEJ0CEEYQ-gEyBAgjECcyBAgjECcyCggAELEDEIMBEEMyCggAELEDEIMBEEMyCAgAELEDEIMBMgcIABCxAxBDMgcIABCxAxBDMgcIABCxAxBDMgoIABCxAxCDARBDOgQIABBHOgUIABCxAzoHCCMQ6gIQJzoHCCMQJxCdAjoECAAQQ1DszQ5Y9tkOYPjkDmgBcAJ4BYABqQOIAZ4NkgEJMS43LjEuMC4xmAEAoAEBqgEHZ3dzLXdperABCsgBCMABAQ&sclient=psy-ab')
type(res)
soup = bs4.BeautifulSoup(res.text,'lxml')
type(soup)

Current_Level = soup.find(class_='IsqQVc_NprOob_XcVN5d')

print (Current_Level)

The link leads to page if you googled "Nasdaq composite". The class I am using for soup.find() is aligned with the value when you right click - inspect the value on the web page.

Open to all solutions to get this value. Thanks for all of your help, greatly appreciated!

Image of the value I'm trying to scrape

Upvotes: 2

Views: 3635

Answers (3)

Dmitriy Zub
Dmitriy Zub

Reputation: 1734

Make sure you're using user-agent otherwise Google will block your request eventually thus you'll receive a completely different HTML with some sort of an error that will contain different selectors and elements. Check what is your user-agent.

Update 2022.03.10:

If this script was throwing an error to past users who saw this answer, it's because of two reasons:

  1. old user-agent. Because of it, Google or other website might assume that if request being made with an old use-agent, it's likely a script or a bot that sends a request.
  2. CSS selector was changed by Google. Because of it, old selector can't be recognized by beautifulsoup.

Code and example in the online IDE:

import requests, lxml
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
}

params = {
  "q": "Nasdaq composite",
  "hl": "en"
}

soup = BeautifulSoup(requests.get('https://www.google.com/search', headers=headers, params=params).text, 'lxml')

print(soup.select_one('.wT3VGc').text)

# 13,089.08

Upvotes: -2

MendelG
MendelG

Reputation: 20118

Try using the class name BNeawe iBp4i AP7Wnd:

import requests
from bs4 import BeautifulSoup

res = requests.get(
    "https://www.google.com/search?safe=active&sxsrf=ALeKk00d7WrRTMvmhypG20E5MOEWpRwKlw%3A1601591747498&ei=w1l2X52DHoPatAXElILQDg&q=nasdaq+composite&oq=nasd&gs_lcp=CgZwc3ktYWIQAxgAMgwIIxAnEJ0CEEYQ-gEyBAgjECcyBAgjECcyCggAELEDEIMBEEMyCggAELEDEIMBEEMyCAgAELEDEIMBMgcIABCxAxBDMgcIABCxAxBDMgcIABCxAxBDMgoIABCxAxCDARBDOgQIABBHOgUIABCxAzoHCCMQ6gIQJzoHCCMQJxCdAjoECAAQQ1DszQ5Y9tkOYPjkDmgBcAJ4BYABqQOIAZ4NkgEJMS43LjEuMC4xmAEAoAEBqgEHZ3dzLXdperABCsgBCMABAQ&sclient=psy-ab"
)

soup = BeautifulSoup(res.text, "lxml")

# Using `.split()` to remove `+159.00 (1.42%)` from the output
Current_Level = soup.find(class_="BNeawe iBp4i AP7Wnd").text.split('+')[0]
print(Current_Level)

Output:

+257.47

Edit:

If you call soup.prettify() you see that the data is under the class BNeawe iBp4i AP7Wnd:

soup = BeautifulSoup(res.text, "lxml")
print(soup.prettify())
...
...
<div>
      <div>
       <div>
        <div class="kCrYT">
         <div>
          <div>
           <div>
            <div class="BNeawe iBp4i AP7Wnd">
             <div>
              <div class="BNeawe iBp4i AP7Wnd">
               11,332.49
               <span class="rQMQod AWuZUe">
                +257.47 (2.32%)
               </span>
              </div>
             </div>
            </div>
           </div>
          </div>
...
...

Upvotes: 1

Mohsin Ali
Mohsin Ali

Reputation: 101

Here is how you can get the value.

from bs4 import BeautifulSoup as soup
import requests

url_to_scrape = "https://www.google.com/search?q=nasdaq+composite&oq=nasdaq+composite&aqs=chrome.0.0l8.5126j1j4&sourceid=chrome&ie=UTF-8"

try:
    client_page = requests.get(url_to_scrape)
except:
    print("Request aborted due to unknown reason!")

page_html = client_page.text
client_page.close()

page_soup = soup(page_html,"html.parser") 
 
nasdaqValue = page_soup.findAll("div",{"class":"BNeawe iBp4i AP7Wnd"})
print(nasdaqValue[0].text)

Upvotes: 2

Related Questions