reset--hard
reset--hard

Reputation: 3

Scraping data from an interactive highchart.js graph

I'm a mostly a lurker on this platform and try to solve my problems using the answer of already asked questions but I couldn't find a question to my current problem. I try to scrape data from this website website using scrapy. I'm already able to scrape most of the data I need however, there are two interactive highcharts i'd like to have the data from.Picture of first graph

What I tried so far:

A hint and/or explanation how to scrape this chart data from this website would be much appreciated.

To see the graphs you have to login here. I've created a throwaway account with: email: [email protected], password: 12345 so you can see the data.


Update:

Sebastians answer pointed me to the right direction. I ended up using scarpy_splash which allows to execute javascript code with lua. With the code underneath I'm able to scrape all the data I needed.

        LUA_SCRIPT = """
            function main(splash)
                 
                 -- Get cookies from previous session
                 splash:init_cookies(splash.args.cookies)
                 assert(splash:go(splash.args.url))
                 assert(splash:wait(0.5))
                 
                 -- Extract data from page
                 -- Read amount of variables in second table
                 table_2_no_series = splash:evaljs('Highcharts.charts[1].series.length')
     
                 -- If second table has more variable then one, get this data aswell 
                 if (table_2_no_series==2) or (table_2_no_series==3) then
                    table_2_y1_data = splash:evaljs('Highcharts.charts[1].series[0].yData')
                    table_2_y1_name = splash:evaljs('Highcharts.charts[1].series[0].name')
                 end
                 if (table_2_no_series==3) then
                    table_2_y3_data = splash:evaljs('Highcharts.charts[1].series[2].yData')
                    table_2_y3_name = splash:evaljs('Highcharts.charts[1].series[2].name')  
                 end
                 
                 return {
                          -- Extract webiste title
                         title = splash:evaljs('document.title'),
                          -- Extract first table data
                         table_1_name = splash:evaljs('Highcharts.charts[0].title.textStr'),
                          -- Extract Timestamps
                         table_1_x = splash:evaljs('Highcharts.charts[0].series[0].xAxis.categories'),
                          -- Extract Finanzierungsstand
                         table_1_y_data = splash:evaljs('Highcharts.charts[0].series[1].yData'),
                         table_1_y_name = splash:evaljs('Highcharts.charts[0].title.textStr'),
         
                         -- Extract second table data
                         table_2_y1_data,
                         table_2_y1_name, 
                         table_2_y3_data,
                         table_2_y3_name,
                         cookies = splash:get_cookies(),
                     }
            end
         """
        SCRAPY_ARGS = {
             'lua_source': LUA_SCRIPT, 
             'cookies' : self.cookies
             }

        # Look for json data if we sucessfully logged in
        yield SplashRequest(url=response.url,
                            callback=self.parse_highchart_data,
                            endpoint='execute', args=SCRAPY_ARGS,
                            session_id="foo")

Note: The highchart api also has a .getCSV which exports the data in csv format. However it seems like this site blocked this function.

Upvotes: 0

Views: 1648

Answers (2)

Loco Barocco
Loco Barocco

Reputation: 179

This worked for me: console.log(Highcharts.charts[1].series[0].processedYData)

Upvotes: 0

Sebastian Wędzel
Sebastian Wędzel

Reputation: 11633

It's not exactly a scrape/fetching approach, but from the Highcharts site, you can see the whole chart config using the web console tool. Try to use:

console.log(Highcharts.charts) which shows the array of the rendered charts on the page. Next, go to particular chart -> series -> data, for example:

console.log(Highcharts.charts[0].series[1].data)

Upvotes: 1

Related Questions