Reputation: 3
I'm a mostly a lurker on this platform and try to solve my problems using the answer of already asked questions but I couldn't find a question to my current problem. I try to scrape data from this website website using scrapy. I'm already able to scrape most of the data I need however, there are two interactive highcharts i'd like to have the data from.Picture of first graph
What I tried so far:
A hint and/or explanation how to scrape this chart data from this website would be much appreciated.
To see the graphs you have to login here.
I've created a throwaway account with:
email: [email protected]
, password: 12345
so you can see the data.
Sebastians answer pointed me to the right direction.
I ended up using scarpy_splash
which allows to execute javascript code with lua. With the code underneath I'm able to scrape all the data I needed.
LUA_SCRIPT = """
function main(splash)
-- Get cookies from previous session
splash:init_cookies(splash.args.cookies)
assert(splash:go(splash.args.url))
assert(splash:wait(0.5))
-- Extract data from page
-- Read amount of variables in second table
table_2_no_series = splash:evaljs('Highcharts.charts[1].series.length')
-- If second table has more variable then one, get this data aswell
if (table_2_no_series==2) or (table_2_no_series==3) then
table_2_y1_data = splash:evaljs('Highcharts.charts[1].series[0].yData')
table_2_y1_name = splash:evaljs('Highcharts.charts[1].series[0].name')
end
if (table_2_no_series==3) then
table_2_y3_data = splash:evaljs('Highcharts.charts[1].series[2].yData')
table_2_y3_name = splash:evaljs('Highcharts.charts[1].series[2].name')
end
return {
-- Extract webiste title
title = splash:evaljs('document.title'),
-- Extract first table data
table_1_name = splash:evaljs('Highcharts.charts[0].title.textStr'),
-- Extract Timestamps
table_1_x = splash:evaljs('Highcharts.charts[0].series[0].xAxis.categories'),
-- Extract Finanzierungsstand
table_1_y_data = splash:evaljs('Highcharts.charts[0].series[1].yData'),
table_1_y_name = splash:evaljs('Highcharts.charts[0].title.textStr'),
-- Extract second table data
table_2_y1_data,
table_2_y1_name,
table_2_y3_data,
table_2_y3_name,
cookies = splash:get_cookies(),
}
end
"""
SCRAPY_ARGS = {
'lua_source': LUA_SCRIPT,
'cookies' : self.cookies
}
# Look for json data if we sucessfully logged in
yield SplashRequest(url=response.url,
callback=self.parse_highchart_data,
endpoint='execute', args=SCRAPY_ARGS,
session_id="foo")
Note: The highchart api also has a .getCSV
which exports the data in csv format. However it seems like this site blocked this function.
Upvotes: 0
Views: 1648
Reputation: 179
This worked for me: console.log(Highcharts.charts[1].series[0].processedYData)
Upvotes: 0
Reputation: 11633
It's not exactly a scrape/fetching approach, but from the Highcharts site, you can see the whole chart config using the web console tool. Try to use:
console.log(Highcharts.charts)
which shows the array of the rendered charts on the page. Next, go to particular chart -> series -> data, for example:
console.log(Highcharts.charts[0].series[1].data)
Upvotes: 1