sagar
sagar

Reputation: 801

Web Scraping - No content displayed

I am trying to fetch the stock of a company specified by a user by taking the input. I am using requests to get the source code and BeautifulSoup to scrape. I am fetching the data from google.com. I am trying the fetch only the last stock price (806.93 in the picture). When I run my script, it prints none. None of the data is being fetched. What am I missing ?

enter image description here

# -*- coding: utf-8 -*-

from bs4 import BeautifulSoup
import requests

company = raw_input("Enter the company name:")

URL = "https://www.google.co.in/?gfe_rd=cr&ei=-AKmV6eqC-LH8AfRqb_4Aw#newwindow=1&safe=off&q="+company+"+stock"

request = requests.get(URL)
soup = BeautifulSoup(request.content,"lxml")

code = soup.find('span',{'class':'_Rnb fmob_pr fac-l','data-symbol':'GOOGL'})
print code.contents[0]

The source code of the page looks like this :

The source code

Upvotes: 1

Views: 1169

Answers (3)

Dmitriy Zub
Dmitriy Zub

Reputation: 1724

You're looking for this:

# two selectors which will handle two layouts
current_price = soup.select_one('.wT3VGc, .XcVN5d').text

Have a look at the SelectorGadget Chrome extension to grab CSS selectors by clicking on the desired element in your browser. CSS selectors reference.


It might be because there's no user-agent specified in your request headers.

The default requests user-agent is python-requests thus Google blocks a request because it knows that it's a bot and not a "real" user visit and you received a different HTML with different selectors and elements, and some sort of an error. User-agent fakes user visit by adding this information into HTTP request headers.

Pass user-agent into request headers:

headers = {
    'User-agent':
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}
response = requests.get('YOUR_URL', headers=headers)

Code and example in the online IDE:

import requests, lxml
from bs4 import BeautifulSoup

headers = {
  'User-agent':
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0'

params = {
  'q': 'alphabet inc class a stock',
  'gl': 'us'
}

html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')


# two selectors which will handle two layouts
current_price = soup.select_one('.wT3VGc, .XcVN5d').text
print(current_price)

# 2,816.00

Alternatively, you can achieve the same thing by using Google Direct Answer Box API from SerpApi. It's a paid API with a free plan.

The difference in your case is that you only need to iterate over structured JSON and get the data you want fast rather than figuring out why certain things don't work as expected and then to maintain it over time.

Code to integrate:

from serpapi import GoogleSearch

params = {
  "api_key": "YOUR_API_KEY",
  "engine": "google",
  "q": "alphabet inc class a stock",
  "gl": "us",
  "hl": "en"
}

search = GoogleSearch(params)
results = search.get_dict()

current_price = results['answer_box']['price']
print(current_price)

# 2,816.00

P.S - I wrote an in-depth blog post about how to reduce the chance of being blocked while web scraping search engines.

Disclaimer, I work for SerpApi.

Upvotes: 0

Aur
Aur

Reputation: 215

I went to https://www.google.com/?gfe_rd=cr&ei=-AKmV6eqC-LH8AfRqb_4Aw#newwindow=1&safe=off&q=+google+stock , did a right click and "View Page Source" but did not see the code that you screenshotted.

Then I typed out a section of your code screenshot and created a BeautifulSoup object with it and then ran your find on it:

test_screenshot = BeautifulSoup('<div class="_F0c" data-tmid="/m/07zln7n"><span class="_Rnb fmob_pr fac-l" data-symbol="GOOGL" data-tmid="/m/07zln7n" data-value="806.93">806.93.</span> = $0<span class ="_hgj">USD</span>')

test_screenshot.find('span',{'class':'_Rnb fmob_pr fac-l','data-symbol':'GOOGL'})`

Which will output what you want: <span class="_Rnb fmob_pr fac-l" data-symbol="GOOGL" data-tmid="/m/07zln7n" data-value="806.93">806.93.</span>

This means that the code you are getting is not the code you expect to get.

I suggest using the google finance page: https://www.google.com/finance?q=google (replace 'google' with what you want to search), which will give you wnat you are looking for:

request = requests.get(URL)
soup = BeautifulSoup(request.content,"lxml")
code = soup.find("span",{'class':'pr'})
print code.contents

Will give you [u'\n', <span id="ref_694653_l">806.93</span>, u'\n'].

In general, scraping Google search results can get really nasty, so try to avoid it if you can.

You might also want to look into Yahoo Finance Python API.

Upvotes: 1

SO44
SO44

Reputation: 1329

Looks like that source is from inspecting the element, not the actual source. A couple of suggestions. Use google finance to get rid of some noise - https://www.google.com/finance?q=googl would be the URL. On that page there is a section that looks like this:

<div class=g-unit>
<div id=market-data-div class="id-market-data-div nwp g-floatfix">
<div id=price-panel class="id-price-panel goog-inline-block">
<div>
<span class="pr">
<span id="ref_694653_l">806.93</span>
</span>
<div class="id-price-change nwp">
<span class="ch bld"><span class="chg" id="ref_694653_c">+9.68</span>
<span class="chg" id="ref_694653_cp">(1.21%)</span>
</span>
</div>
</div>

You should be able to pull the number out of that.

Upvotes: 1

Related Questions