Ni3_k
Ni3_k

Reputation: 56

Search for specific class using bs4

I am trying to scrape this google finance link. This page has a class with SP_arrow_last_off class. So, if I do something like this:

url = "https://www.google.com/finance/historical?cid=4899364&startdate=Jan%201%2C%202000&enddate=Mar%2023%2C%202017&start=2000&num=200&ei=cRHaWNj3FISougTSg6moCw"

headers={'Host': 'www.google.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
last = soup.find_all(class_= "SP_arrow_last_off")
if(last):
    print("HI")

It does not print anything. I checked more, what I am getting in last is an empty list or none. How can I get True if a class exist or False if not.

Upvotes: 1

Views: 415

Answers (4)

Denis Skopa
Denis Skopa

Reputation: 99

Unfortunately the link in the question is no longer active and redirects to Google Finance home page.

In my solution, an example shows how to get you may be interested in data from the main page.

You can find the necessary elements on the page using CSS selectors, for this you can use a SelectorGadget Chrome extension by clicking on the desired element in your browser (not always work perfectly if the website is rendered via JavaScript).

Check code in online IDE.

from bs4 import BeautifulSoup
import requests, json, lxml
   
params = {
    "hl": "en",       # language
    "gl": "us"        # country of the search, US-> USA
}

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
}

html = requests.get("https://www.google.com/finance/", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")

data = []

for result in soup.select(".H8Ch1 .SxcTic"):
    name = result.select_one(".ZvmM7").text
    price = result.select_one(".YMlKec").text
    quote = result.select_one(".COaKTb").text
    price_change_percent = result.select_one(".JwB6zf").text
    data.append({
       "name": name,
       "price": price,
       "quote": quote,
       "price_change_percent": price_change_percent
    })

print(json.dumps(data, indent=2))

Example output:

[
    {
    "name": "Tesla Inc",
    "price": "$137.80",
    "quote": "TSLA",
    "price_change_percent": "8.05%"
  },
  {
    "name": "Invesco QQQ Trust Series 1",
    "price": "$269.54",
    "quote": "QQQ",
    "price_change_percent": "0.078%"
  },
  {
    "name": "Palantir Technologies Inc",
    "price": "$6.31",
    "quote": "PLTR",
    "price_change_percent": "0.63%"
  },
  {
    "name": "Nike Inc",
    "price": "$103.21",
    "quote": "NKE",
    "price_change_percent": "0.16%"
  },
  # ...
]

FYI, there're blog posts about web scraping Google Finance like web scraping Google Finance main page in Python and scrape Google Finance ticker quote data in Python.

Upvotes: -1

Satish Prakash Garg
Satish Prakash Garg

Reputation: 2233

The class 'SP_arrow_last_off' exists in the source code but the data is filled into it using JavaScript functions.

If you need to get the data, you need to understand the data existing in the source code.

To fetch the data, you can do something like this using lxml module which is order of magnitude faster than BeautifulSoup (if written properly) :

import requests
from lxml import html

url = "https://www.google.com/finance/historical?cid=4899364&startdate=Jan%201%2C%202000&enddate=Mar%2023%2C%202017&start=2000&num=200&ei=cRHaWNj3FISougTSg6moCw"

headers={'Host': 'www.google.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
}

response = requests.get(url, headers=headers)
soup = html.fromstring(response.content)
result_list = []
for row in soup.xpath('//table[@class="gf-table historical_price"]/tr') : 
    data = row.xpath('.//td/text()')
    if data :
        result_list.append({'date' : data[0].strip(), 'open' : data[1].strip(),
            'high' : data[2].strip(), 'low' : data[3].strip(),
            'close' : data[4].strip(), 'volume' : data[5].strip()})

print result_list

This will result in something like this :

[{'volume': '9,253', 'high': '15.70', 'low': '14.15', 'date': 'Jan 22, 2009', 'close': '14.35', 'open': '15.70'}, {'volume': '10,091', 'high': '14.95', 'low': '14.30', 'date': 'Jan 21, 2009', 'close': '14.65', 'open': '14.50'}, {'volume': '9,459', 'high': '15.00', 'low': '14.20', 'date': 'Jan 20, 2009', 'close': '14.90', 'open': '15.00'}, {'volume': '3,768', 'high': '14.90', 'low': '14.30', 'date': 'Jan 19, 2009', 'close': '14.35', 'open': '14.50'}, {'volume': '9,720', 'high': '15.00', 'low': '14.35', 'date': 'Jan 16, 2009', 'close': '14.50', 'open': '14.80'}, {'volume': '5,863', 'high': '15.00', 'low': '14.00', 'date': 'Jan 15, 2009', 'close': '15.00', 'open': '14.75'}, {'volume': '7,952', 'high': '15.50', 'low': '14.25', 'date': 'Jan 14, 2009', 'close': '14.80', 'open': '14.25'}, {'volume': '8,359', 'high': '15.05', 'low': '14.20', 'date': 'Jan 13, 2009', 'close': '14.65', 'open': '14.55'}, {'volume': '12,854', 'high': '15.85', 'low': '14.40', 'date': 'Jan 12, 2009', 'close': '14.90', 'open': '15.00'}, {'volume': '35,580', 'high': '15.45', 'low': '13.20', 'date': 'Jan 9, 2009', 'close': '15.25', 'open': '15.10'}, {'volume': '29,063', 'high': '17.85', 'low': '15.15', 'date': 'Jan 7, 2009', 'close': '15.85', 'open': '17.50'}, {'volume': '16,543', 'high': '18.30', 'low': '17.55', 'date': 'Jan 6, 2009', 'close': '17.90', 'open': '17.70'}, {'volume': '36,993', 'high': '19.50', 'low': '18.00', 'date': 'Jan 5, 2009', 'close': '18.30', 'open': '18.90'}, {'volume': '120,522', 'high': '19.70', 'low': '17.30', 'date': 'Jan 2, 2009', 'close': '18.30', 'open': '17.30'}, {'volume': '16,329', 'high': '16.00', 'low': '15.10', 'date': 'Dec 31, 2008', 'close': '15.70', 'open': '15.85'}, {'volume': '53,500', 'high': '16.30', 'low': '14.90', 'date': 'Dec 30, 2008', 'close': '15.10', 'open': '16.00'}, {'volume': '14,006', 'high': '16.30', 'low': '15.10', 'date': 'Dec 29, 2008', 'close': '15.40', 'open': '15.50'}, {'volume': '5,025', 'high': '16.50', 'low': '15.50', 'date': 'Dec 26, 2008', 'close': '15.60', 'open': '16.30'}, {'volume': '17,318', 'high': '16.35', 'low': '15.50', 'date': 'Dec 24, 2008', 'close': '16.05', 'open': '16.35'}, {'volume': '11,175', 'high': '16.55', 'low': '16.00', 'date': 'Dec 23, 2008', 'close': '16.15', 'open': '16.25'}, {'volume': '13,192', 'high': '17.20', 'low': '16.35', 'date': 'Dec 22, 2008', 'close': '16.80', 'open': '16.90'}, {'volume': '37,826', 'high': '17.45', 'low': '16.25', 'date': 'Dec 19, 2008', 'close': '16.60', 'open': '16.95'}, {'volume': '10,818', 'high': '17.00', 'low': '16.25', 'date': 'Dec 18, 2008', 'close': '16.60', 'open': '16.50'}, {'volume': '26,070', 'high': '18.50', 'low': '16.70', 'date': 'Dec 17, 2008', 'close': '16.70', 'open': '17.95'}, {'volume': '15,573', 'high': '18.00', 'low': '17.05', 'date': 'Dec 16, 2008', 'close': '17.55', 'open': '17.45'}, {'volume': '18,849', 'high': '17.65', 'low': '16.75', 'date': 'Dec 15, 2008', 'close': '17.10', 'open': '17.65'}, {'volume': '37,383', 'high': '18.45', 'low': '16.05', 'date': 'Dec 12, 2008', 'close': '16.50', 'open': '17.25'}, {'volume': '57,272', 'high': '18.80', 'low': '16.50', 'date': 'Dec 11, 2008', 'close': '18.15', 'open': '16.75'}, {'volume': '34,212', 'high': '17.95', 'low': '16.05', 'date': 'Dec 10, 2008', 'close': '17.95', 'open': '16.50'}, {'volume': '11,611', 'high': '18.00', 'low': '16.00', 'date': 'Dec 8, 2008', 'close': '16.10', 'open': '18.00'}, {'volume': '20,052', 'high': '17.50', 'low': '15.65', 'date': 'Dec 5, 2008', 'close': '16.40', 'open': '16.60'}, {'volume': '9,132', 'high': '17.00', 'low': '14.75', 'date': 'Dec 4, 2008', 'close': '16.15', 'open': '14.75'}, {'volume': '6,023', 'high': '16.45', 'low': '15.70', 'date': 'Dec 3, 2008', 'close': '16.00', 'open': '16.00'}, {'volume': '13,567', 'high': '16.30', 'low': '15.10', 'date': 'Dec 2, 2008', 'close': '15.55', 'open': '16.30'}, {'volume': '15,421', 'high': '17.15', 'low': '15.05', 'date': 'Dec 1, 2008', 'close': '16.70', 'open': '15.05'}, {'volume': '3,543', 'high': '17.35', 'low': '16.25', 'date': 'Nov 28, 2008', 'close': '16.35', 'open': '16.25'}, {'volume': '11,130', 'high': '17.65', 'low': '16.55', 'date': 'Nov 26, 2008', 'close': '16.90', 'open': '17.25'}, {'volume': '126,113', 'high': '19.90', 'low': '16.25', 'date': 'Nov 25, 2008', 'close': '17.00', 'open': '16.80'}, {'volume': '17,069', 'high': '17.55', 'low': '15.75', 'date': 'Nov 24, 2008', 'close': '16.50', 'open': '15.75'}, {'volume': '10,550', 'high': '16.35', 'low': '15.30', 'date': 'Nov 21, 2008', 'close': '16.00', 'open': '15.80'}, {'volume': '9,892', 'high': '17.00', 'low': '16.00', 'date': 'Nov 20, 2008', 'close': '16.25', 'open': '16.00'}, {'volume': '16,597', 'high': '17.65', 'low': '16.50', 'date': 'Nov 19, 2008', 'close': '16.55', 'open': '17.15'}, {'volume': '13,041', 'high': '18.00', 'low': '16.70', 'date': 'Nov 18, 2008', 'close': '17.10', 'open': '17.70'}, {'volume': '13,403', 'high': '18.45', 'low': '17.30', 'date': 'Nov 17, 2008', 'close': '18.00', 'open': '18.20'}, {'volume': '24,101', 'high': '19.20', 'low': '18.15', 'date': 'Nov 14, 2008', 'close': '18.45', 'open': '19.00'}, {'volume': '68,975', 'high': '18.85', 'low': '17.55', 'date': 'Nov 12, 2008', 'close': '18.60', 'open': '18.40'}, {'volume': '35,525', 'high': '20.05', 'low': '18.25', 'date': 'Nov 11, 2008', 'close': '18.30', 'open': '20.05'}, {'volume': '152,431', 'high': '22.35', 'low': '19.65', 'date': 'Nov 10, 2008', 'close': '20.00', 'open': '21.20'}, {'volume': '245,444', 'high': '21.60', 'low': '17.60', 'date': 'Nov 7, 2008', 'close': '20.00', 'open': '17.60'}, {'volume': '40,649', 'high': '18.80', 'low': '17.10', 'date': 'Nov 6, 2008', 'close': '18.30', 'open': '17.15'}, {'volume': '116,608', 'high': '19.45', 'low': '14.90', 'date': 'Nov 5, 2008', 'close': '18.55', 'open': '18.30'}, {'volume': '113,707', 'high': '19.50', 'low': '16.50', 'date': 'Nov 4, 2008', 'close': '18.05', 'open': '17.00'}, {'volume': '54,681', 'high': '18.00', 'low': '16.75', 'date': 'Nov 3, 2008', 'close': '17.10', 'open': '17.80'}, {'volume': '70,763', 'high': '18.60', 'low': '16.70', 'date': 'Oct 31, 2008', 'close': '17.05', 'open': '17.20'}, {'volume': '60,138', 'high': '19.10', 'low': '16.00', 'date': 'Oct 29, 2008', 'close': '16.45', 'open': '19.10'}, {'volume': '70,725', 'high': '16.95', 'low': '13.50', 'date': 'Oct 27, 2008', 'close': '14.60', 'open': '15.25'}, {'volume': '61,150', 'high': '19.90', 'low': '16.05', 'date': 'Oct 24, 2008', 'close': '16.50', 'open': '17.25'}, {'volume': '54,468', 'high': '20.25', 'low': '16.30', 'date': 'Oct 23, 2008', 'close': '18.75', 'open': '18.30'}, {'volume': '164,349', 'high': '22.20', 'low': '20.00', 'date': 'Oct 22, 2008', 'close': '20.25', 'open': '21.85'}, {'volume': '88,705', 'high': '22.95', 'low': '21.10', 'date': 'Oct 21, 2008', 'close': '21.40', 'open': '22.80'}, {'volume': '361,409', 'high': '23.25', 'low': '19.70', 'date': 'Oct 20, 2008', 'close': '21.80', 'open': '22.50'}, {'volume': '903,134', 'high': '28.70', 'low': '21.95', 'date': 'Oct 17, 2008', 'close': '21.95', 'open': '28.05'}, {'volume': '972,087', 'high': '29.25', 'low': '21.60', 'date': 'Oct 16, 2008', 'close': '26.50', 'open': '22.05'}, {'volume': '563,418', 'high': '25.55', 'low': '20.05', 'date': 'Oct 15, 2008', 'close': '24.55', 'open': '21.30'}, {'volume': '336,544', 'high': '26.00', 'low': '21.50', 'date': 'Oct 14, 2008', 'close': '22.10', 'open': '25.65'}, {'volume': '449,346', 'high': '26.60', 'low': '23.30', 'date': 'Oct 13, 2008', 'close': '24.70', 'open': '24.30'}, {'volume': '603,964', 'high': '24.90', 'low': '21.65', 'date': 'Oct 10, 2008', 'close': '23.65', 'open': '24.90'}, {'volume': '1,232,192', 'high': '29.20', 'low': '25.10', 'date': 'Oct 8, 2008', 'close': '26.40', 'open': '28.00'}, {'volume': '4,556,711', 'high': '38.00', 'low': '27.85', 'date': 'Oct 7, 2008', 'close': '30.05', 'open': '32.00'}, {'volume': '11,750,865', 'high': '80.00', 'low': '31.60', 'date': 'Oct 6, 2008', 'close': '33.55', 'open': '80.00'}]

Upvotes: 3

thebadguy
thebadguy

Reputation: 2140

With the help of phantomjs(http://phantomjs.org/download.html) and Selenium you can do this

Step: 1. on terminal or cmd use command: pip install selenium 2. Download the phantomjs & unzip it than put the "phantomjs.exe" at python path for example on windows, C:\Python27

Than use this code it will give you desired result:

from selenium import webdriver
from bs4 import BeautifulSoup


url = "https://www.google.com/finance/historical?cid=4899364&startdate=Jan%201%2C%202000&enddate=Mar%2023%2C%202017&start=2000&num=200&ei=cRHaWNj3FISougTSg6moCw"
driver = webdriver.PhantomJS()
driver.get(url)

data = driver.page_source

soup = BeautifulSoup(data, 'html.parser')

last = soup.find_all(class_= "SP_arrow_last_off")

if(last):
    print("HI")

This code will give you value of last and will print HI

Upvotes: 1

Vlad
Vlad

Reputation: 560

It seems you need to download the page with browser first using some modules like Selenium. There's no element with class SP_arrow_last_off in the sourcepage code. It may be generated by some JS code, so you don't get it with requests module.

Upvotes: 0

Related Questions