Reputation: 56
I am trying to scrape this google finance link. This page has a class with SP_arrow_last_off class. So, if I do something like this:
url = "https://www.google.com/finance/historical?cid=4899364&startdate=Jan%201%2C%202000&enddate=Mar%2023%2C%202017&start=2000&num=200&ei=cRHaWNj3FISougTSg6moCw"
headers={'Host': 'www.google.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
last = soup.find_all(class_= "SP_arrow_last_off")
if(last):
print("HI")
It does not print anything. I checked more, what I am getting in last is an empty list or none. How can I get True
if a class exist or False
if not.
Upvotes: 1
Views: 415
Reputation: 99
Unfortunately the link in the question is no longer active and redirects to Google Finance home page.
In my solution, an example shows how to get you may be interested in
data from the main page.
You can find the necessary elements on the page using CSS selectors, for this you can use a SelectorGadget Chrome extension by clicking on the desired element in your browser (not always work perfectly if the website is rendered via JavaScript).
from bs4 import BeautifulSoup
import requests, json, lxml
params = {
"hl": "en", # language
"gl": "us" # country of the search, US-> USA
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
}
html = requests.get("https://www.google.com/finance/", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
data = []
for result in soup.select(".H8Ch1 .SxcTic"):
name = result.select_one(".ZvmM7").text
price = result.select_one(".YMlKec").text
quote = result.select_one(".COaKTb").text
price_change_percent = result.select_one(".JwB6zf").text
data.append({
"name": name,
"price": price,
"quote": quote,
"price_change_percent": price_change_percent
})
print(json.dumps(data, indent=2))
Example output:
[
{
"name": "Tesla Inc",
"price": "$137.80",
"quote": "TSLA",
"price_change_percent": "8.05%"
},
{
"name": "Invesco QQQ Trust Series 1",
"price": "$269.54",
"quote": "QQQ",
"price_change_percent": "0.078%"
},
{
"name": "Palantir Technologies Inc",
"price": "$6.31",
"quote": "PLTR",
"price_change_percent": "0.63%"
},
{
"name": "Nike Inc",
"price": "$103.21",
"quote": "NKE",
"price_change_percent": "0.16%"
},
# ...
]
FYI, there're blog posts about web scraping Google Finance like web scraping Google Finance main page in Python and scrape Google Finance ticker quote data in Python.
Upvotes: -1
Reputation: 2233
The class 'SP_arrow_last_off'
exists in the source code but the data is filled into it using JavaScript functions.
If you need to get the data, you need to understand the data existing in the source code.
To fetch the data, you can do something like this using lxml module which is order of magnitude faster than BeautifulSoup (if written properly) :
import requests
from lxml import html
url = "https://www.google.com/finance/historical?cid=4899364&startdate=Jan%201%2C%202000&enddate=Mar%2023%2C%202017&start=2000&num=200&ei=cRHaWNj3FISougTSg6moCw"
headers={'Host': 'www.google.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
}
response = requests.get(url, headers=headers)
soup = html.fromstring(response.content)
result_list = []
for row in soup.xpath('//table[@class="gf-table historical_price"]/tr') :
data = row.xpath('.//td/text()')
if data :
result_list.append({'date' : data[0].strip(), 'open' : data[1].strip(),
'high' : data[2].strip(), 'low' : data[3].strip(),
'close' : data[4].strip(), 'volume' : data[5].strip()})
print result_list
This will result in something like this :
[{'volume': '9,253', 'high': '15.70', 'low': '14.15', 'date': 'Jan 22, 2009', 'close': '14.35', 'open': '15.70'}, {'volume': '10,091', 'high': '14.95', 'low': '14.30', 'date': 'Jan 21, 2009', 'close': '14.65', 'open': '14.50'}, {'volume': '9,459', 'high': '15.00', 'low': '14.20', 'date': 'Jan 20, 2009', 'close': '14.90', 'open': '15.00'}, {'volume': '3,768', 'high': '14.90', 'low': '14.30', 'date': 'Jan 19, 2009', 'close': '14.35', 'open': '14.50'}, {'volume': '9,720', 'high': '15.00', 'low': '14.35', 'date': 'Jan 16, 2009', 'close': '14.50', 'open': '14.80'}, {'volume': '5,863', 'high': '15.00', 'low': '14.00', 'date': 'Jan 15, 2009', 'close': '15.00', 'open': '14.75'}, {'volume': '7,952', 'high': '15.50', 'low': '14.25', 'date': 'Jan 14, 2009', 'close': '14.80', 'open': '14.25'}, {'volume': '8,359', 'high': '15.05', 'low': '14.20', 'date': 'Jan 13, 2009', 'close': '14.65', 'open': '14.55'}, {'volume': '12,854', 'high': '15.85', 'low': '14.40', 'date': 'Jan 12, 2009', 'close': '14.90', 'open': '15.00'}, {'volume': '35,580', 'high': '15.45', 'low': '13.20', 'date': 'Jan 9, 2009', 'close': '15.25', 'open': '15.10'}, {'volume': '29,063', 'high': '17.85', 'low': '15.15', 'date': 'Jan 7, 2009', 'close': '15.85', 'open': '17.50'}, {'volume': '16,543', 'high': '18.30', 'low': '17.55', 'date': 'Jan 6, 2009', 'close': '17.90', 'open': '17.70'}, {'volume': '36,993', 'high': '19.50', 'low': '18.00', 'date': 'Jan 5, 2009', 'close': '18.30', 'open': '18.90'}, {'volume': '120,522', 'high': '19.70', 'low': '17.30', 'date': 'Jan 2, 2009', 'close': '18.30', 'open': '17.30'}, {'volume': '16,329', 'high': '16.00', 'low': '15.10', 'date': 'Dec 31, 2008', 'close': '15.70', 'open': '15.85'}, {'volume': '53,500', 'high': '16.30', 'low': '14.90', 'date': 'Dec 30, 2008', 'close': '15.10', 'open': '16.00'}, {'volume': '14,006', 'high': '16.30', 'low': '15.10', 'date': 'Dec 29, 2008', 'close': '15.40', 'open': '15.50'}, {'volume': '5,025', 'high': '16.50', 'low': '15.50', 'date': 'Dec 26, 2008', 'close': '15.60', 'open': '16.30'}, {'volume': '17,318', 'high': '16.35', 'low': '15.50', 'date': 'Dec 24, 2008', 'close': '16.05', 'open': '16.35'}, {'volume': '11,175', 'high': '16.55', 'low': '16.00', 'date': 'Dec 23, 2008', 'close': '16.15', 'open': '16.25'}, {'volume': '13,192', 'high': '17.20', 'low': '16.35', 'date': 'Dec 22, 2008', 'close': '16.80', 'open': '16.90'}, {'volume': '37,826', 'high': '17.45', 'low': '16.25', 'date': 'Dec 19, 2008', 'close': '16.60', 'open': '16.95'}, {'volume': '10,818', 'high': '17.00', 'low': '16.25', 'date': 'Dec 18, 2008', 'close': '16.60', 'open': '16.50'}, {'volume': '26,070', 'high': '18.50', 'low': '16.70', 'date': 'Dec 17, 2008', 'close': '16.70', 'open': '17.95'}, {'volume': '15,573', 'high': '18.00', 'low': '17.05', 'date': 'Dec 16, 2008', 'close': '17.55', 'open': '17.45'}, {'volume': '18,849', 'high': '17.65', 'low': '16.75', 'date': 'Dec 15, 2008', 'close': '17.10', 'open': '17.65'}, {'volume': '37,383', 'high': '18.45', 'low': '16.05', 'date': 'Dec 12, 2008', 'close': '16.50', 'open': '17.25'}, {'volume': '57,272', 'high': '18.80', 'low': '16.50', 'date': 'Dec 11, 2008', 'close': '18.15', 'open': '16.75'}, {'volume': '34,212', 'high': '17.95', 'low': '16.05', 'date': 'Dec 10, 2008', 'close': '17.95', 'open': '16.50'}, {'volume': '11,611', 'high': '18.00', 'low': '16.00', 'date': 'Dec 8, 2008', 'close': '16.10', 'open': '18.00'}, {'volume': '20,052', 'high': '17.50', 'low': '15.65', 'date': 'Dec 5, 2008', 'close': '16.40', 'open': '16.60'}, {'volume': '9,132', 'high': '17.00', 'low': '14.75', 'date': 'Dec 4, 2008', 'close': '16.15', 'open': '14.75'}, {'volume': '6,023', 'high': '16.45', 'low': '15.70', 'date': 'Dec 3, 2008', 'close': '16.00', 'open': '16.00'}, {'volume': '13,567', 'high': '16.30', 'low': '15.10', 'date': 'Dec 2, 2008', 'close': '15.55', 'open': '16.30'}, {'volume': '15,421', 'high': '17.15', 'low': '15.05', 'date': 'Dec 1, 2008', 'close': '16.70', 'open': '15.05'}, {'volume': '3,543', 'high': '17.35', 'low': '16.25', 'date': 'Nov 28, 2008', 'close': '16.35', 'open': '16.25'}, {'volume': '11,130', 'high': '17.65', 'low': '16.55', 'date': 'Nov 26, 2008', 'close': '16.90', 'open': '17.25'}, {'volume': '126,113', 'high': '19.90', 'low': '16.25', 'date': 'Nov 25, 2008', 'close': '17.00', 'open': '16.80'}, {'volume': '17,069', 'high': '17.55', 'low': '15.75', 'date': 'Nov 24, 2008', 'close': '16.50', 'open': '15.75'}, {'volume': '10,550', 'high': '16.35', 'low': '15.30', 'date': 'Nov 21, 2008', 'close': '16.00', 'open': '15.80'}, {'volume': '9,892', 'high': '17.00', 'low': '16.00', 'date': 'Nov 20, 2008', 'close': '16.25', 'open': '16.00'}, {'volume': '16,597', 'high': '17.65', 'low': '16.50', 'date': 'Nov 19, 2008', 'close': '16.55', 'open': '17.15'}, {'volume': '13,041', 'high': '18.00', 'low': '16.70', 'date': 'Nov 18, 2008', 'close': '17.10', 'open': '17.70'}, {'volume': '13,403', 'high': '18.45', 'low': '17.30', 'date': 'Nov 17, 2008', 'close': '18.00', 'open': '18.20'}, {'volume': '24,101', 'high': '19.20', 'low': '18.15', 'date': 'Nov 14, 2008', 'close': '18.45', 'open': '19.00'}, {'volume': '68,975', 'high': '18.85', 'low': '17.55', 'date': 'Nov 12, 2008', 'close': '18.60', 'open': '18.40'}, {'volume': '35,525', 'high': '20.05', 'low': '18.25', 'date': 'Nov 11, 2008', 'close': '18.30', 'open': '20.05'}, {'volume': '152,431', 'high': '22.35', 'low': '19.65', 'date': 'Nov 10, 2008', 'close': '20.00', 'open': '21.20'}, {'volume': '245,444', 'high': '21.60', 'low': '17.60', 'date': 'Nov 7, 2008', 'close': '20.00', 'open': '17.60'}, {'volume': '40,649', 'high': '18.80', 'low': '17.10', 'date': 'Nov 6, 2008', 'close': '18.30', 'open': '17.15'}, {'volume': '116,608', 'high': '19.45', 'low': '14.90', 'date': 'Nov 5, 2008', 'close': '18.55', 'open': '18.30'}, {'volume': '113,707', 'high': '19.50', 'low': '16.50', 'date': 'Nov 4, 2008', 'close': '18.05', 'open': '17.00'}, {'volume': '54,681', 'high': '18.00', 'low': '16.75', 'date': 'Nov 3, 2008', 'close': '17.10', 'open': '17.80'}, {'volume': '70,763', 'high': '18.60', 'low': '16.70', 'date': 'Oct 31, 2008', 'close': '17.05', 'open': '17.20'}, {'volume': '60,138', 'high': '19.10', 'low': '16.00', 'date': 'Oct 29, 2008', 'close': '16.45', 'open': '19.10'}, {'volume': '70,725', 'high': '16.95', 'low': '13.50', 'date': 'Oct 27, 2008', 'close': '14.60', 'open': '15.25'}, {'volume': '61,150', 'high': '19.90', 'low': '16.05', 'date': 'Oct 24, 2008', 'close': '16.50', 'open': '17.25'}, {'volume': '54,468', 'high': '20.25', 'low': '16.30', 'date': 'Oct 23, 2008', 'close': '18.75', 'open': '18.30'}, {'volume': '164,349', 'high': '22.20', 'low': '20.00', 'date': 'Oct 22, 2008', 'close': '20.25', 'open': '21.85'}, {'volume': '88,705', 'high': '22.95', 'low': '21.10', 'date': 'Oct 21, 2008', 'close': '21.40', 'open': '22.80'}, {'volume': '361,409', 'high': '23.25', 'low': '19.70', 'date': 'Oct 20, 2008', 'close': '21.80', 'open': '22.50'}, {'volume': '903,134', 'high': '28.70', 'low': '21.95', 'date': 'Oct 17, 2008', 'close': '21.95', 'open': '28.05'}, {'volume': '972,087', 'high': '29.25', 'low': '21.60', 'date': 'Oct 16, 2008', 'close': '26.50', 'open': '22.05'}, {'volume': '563,418', 'high': '25.55', 'low': '20.05', 'date': 'Oct 15, 2008', 'close': '24.55', 'open': '21.30'}, {'volume': '336,544', 'high': '26.00', 'low': '21.50', 'date': 'Oct 14, 2008', 'close': '22.10', 'open': '25.65'}, {'volume': '449,346', 'high': '26.60', 'low': '23.30', 'date': 'Oct 13, 2008', 'close': '24.70', 'open': '24.30'}, {'volume': '603,964', 'high': '24.90', 'low': '21.65', 'date': 'Oct 10, 2008', 'close': '23.65', 'open': '24.90'}, {'volume': '1,232,192', 'high': '29.20', 'low': '25.10', 'date': 'Oct 8, 2008', 'close': '26.40', 'open': '28.00'}, {'volume': '4,556,711', 'high': '38.00', 'low': '27.85', 'date': 'Oct 7, 2008', 'close': '30.05', 'open': '32.00'}, {'volume': '11,750,865', 'high': '80.00', 'low': '31.60', 'date': 'Oct 6, 2008', 'close': '33.55', 'open': '80.00'}]
Upvotes: 3
Reputation: 2140
With the help of phantomjs(http://phantomjs.org/download.html) and Selenium you can do this
Step: 1. on terminal or cmd use command: pip install selenium 2. Download the phantomjs & unzip it than put the "phantomjs.exe" at python path for example on windows, C:\Python27
Than use this code it will give you desired result:
from selenium import webdriver
from bs4 import BeautifulSoup
url = "https://www.google.com/finance/historical?cid=4899364&startdate=Jan%201%2C%202000&enddate=Mar%2023%2C%202017&start=2000&num=200&ei=cRHaWNj3FISougTSg6moCw"
driver = webdriver.PhantomJS()
driver.get(url)
data = driver.page_source
soup = BeautifulSoup(data, 'html.parser')
last = soup.find_all(class_= "SP_arrow_last_off")
if(last):
print("HI")
This code will give you value of last and will print HI
Upvotes: 1
Reputation: 560
It seems you need to download the page with browser first using some modules like Selenium. There's no element with class SP_arrow_last_off
in the sourcepage code. It may be generated by some JS code, so you don't get it with requests
module.
Upvotes: 0