Reputation: 63
I am trying to scrape the following svg's from the following link:
https://finance.yahoo.com/quote/AAPL/analysts?p=AAPL
The portion I am trying to scrape is as follows:
I do not need the words of the chart (just the graphs themselves). However, I have never scraped an svg image before and i'm not sure if it is possible. I looked around but could not find any useful python packages to directly do this.
I know that I can take a screenshot of the image with python using selenium and then use PIL to crop it and save it as an svg, but I am wondering if there is a more direct way to grab these charts off the page. Any useful packages or implementations would be helpful. Thank you.
Edit: Got some down votes but not sure why Here is how I would implement it in my way..
import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
class Screenshot(QWebView):
def __init__(self):
self.app = QApplication(sys.argv)
QWebView.__init__(self)
self._loaded = False
self.loadFinished.connect(self._loadFinished)
def capture(self, url, output_file):
self.load(QUrl(url))
self.wait_load()
# set to webpage size
frame = self.page().mainFrame()
self.page().setViewportSize(frame.contentsSize())
# render image
image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
painter = QPainter(image)
frame.render(painter)
painter.end()
print 'saving', output_file
image.save(output_file)
def wait_load(self, delay=0):
# process app events until page loaded
while not self._loaded:
self.app.processEvents()
time.sleep(delay)
self._loaded = False
def _loadFinished(self, result):
self._loaded = True
s = Screenshot()
s.capture('https://finance.yahoo.com/quote/AAPL/analysts?p=AAPL', 'yhf.png')
I would then use the crop function in PIL to take the images out of the charts.
Upvotes: 2
Views: 4704
Reputation: 3892
Using QWebView for web scraping seams weird to me, although I do realize that there is an advantage that it says to the server "I'm not a web scraper, I'm an embeded browser". Note that this approach is not bulletproof: your scraper can still be detected if it shows a behavior unusual for a human user.
This is how I would do it:
If you want to continue using Qt instead, look for methods in the web view that allow inspecting DOM or extracting the resources the view downloaded.
Upvotes: 2