jerome
jerome

Reputation: 2089

Large dataset visualisation

I have hourly data over several years (> 20 years), I would like to have some hint on how to display such a big amount of data in the browser. I would like to display data as time series because all of the different data sets have the same format (a value at a certain time), but display different kind of information. I looked at d3js, and manage to plot all my data, that is 20 years of data or more, then use brushing to zoom in based on this very good exemple.
But the browser don't support that much of data and became extremely slow.
On the server side I use servlets to send data in json format.

I display different kind of data but all have the same format which is time and value, but displaying different kind of information.

Thanks for some advices, hints and exemples on best practices to visualize large datasets.

Upvotes: 2

Views: 2144

Answers (2)

mike-schultz
mike-schultz

Reputation: 2364

An issue with using libraries such as d3.js is that it relies on SVG to create all of the data and to maintain an object to reference the data. This can obviously lead to a DOM explosion depending on your dataset size. You could sample the data before rendering it and sending it to the browser, but the granularity and accuracy could be lost. Maybe you need those non-outlier points to identify trends. It really depends on the size of your dataset though.

Assuming you have a dataset size of ~175,200 points (one for every hour in 20 years), I would suggest to you a library called ZingChart (http://www.zingchart.com). It has many different styling options but more importantly it has different rendering capabilities (SVG or canvas) that can render the amount of data you are trying to visualize. In particular, take note of the zoom function which can visualize every single point, along with the ability to add custom tags to each node.

Upvotes: 3

Andrei Bucurei
Andrei Bucurei

Reputation: 2438

Don't bring all the data on the client side.

Instead, you could implement a server side method that will look like this:
getData(startDate, endDate, maxSteps)

This method will always return at most maxSteps records, but which records, it's totally up to you and your data. I would suggest one of the following approaches:

The following steps are common for both methods:

  • get all records available between startDate and endDate
  • if there are less records than maxSteps return all of them

Using the subset of records determined by startDate and endDate continue with the following steps.

Method 1: get exact records from your data. Can be expensive to determine the right ones:

  • determine equidistant points in your data
  • get records from data that are closest to the selected points

    point = startDate;
    stepTimeSpan = (endDate - startDate) / (maxSteps - 1); //will fail if maxSteps = 1
    for (i = 0; i < maxSteps; i++)
    {
        records.Add(getClosestTo(point));
        point = point + stepTimeSpan;
    }
    return records;
    

Method 2: return records resulted from aggregations:

  • split the records in maxSteps buckets with records (by date)
  • obtain one record from each bucket as result of an aggregation

    bucketStart = startDate;
    bucketTimeSpan = (endDate - startDate) / maxSteps;
    for (i = 0; i < maxSteps; i++)
    {
       bucket = getRecordsBetween(bucketStart, bucketStart + bucketTimeSpan);
       records.Add( new Record( AvgDate(bucket), AvgValue(bucket) ) );
       bucketStart = bucketStart + bucketTimeSpan;
    }
    return records;
    

Call this method on client side each time the user changes the interval (using the small chart from the bottom in your example).

Play with maxSteps value until you find the right balance between performance and detail.

Upvotes: 6

Related Questions