Large dataset visualisation

Question

I have hourly data over several years (> 20 years), I would like to have some hint on how to display such a big amount of data in the browser. I would like to display data as time series because all of the different data sets have the same format (a value at a certain time), but display different kind of information. I looked at d3js, and manage to plot all my data, that is 20 years of data or more, then use brushing to zoom in based on this very good exemple.
But the browser don't support that much of data and became extremely slow.
On the server side I use servlets to send data in json format.

I display different kind of data but all have the same format which is time and value, but displaying different kind of information.

Thanks for some advices, hints and exemples on best practices to visualize large datasets.

Andrei Bucurei · Accepted Answer

Don't bring all the data on the client side.

Instead, you could implement a server side method that will look like this:
getData(startDate, endDate, maxSteps)

This method will always return at most maxSteps records, but which records, it's totally up to you and your data. I would suggest one of the following approaches:

The following steps are common for both methods:

get all records available between startDate and endDate
if there are less records than maxSteps return all of them

Using the subset of records determined by startDate and endDate continue with the following steps.

Method 1: get exact records from your data. Can be expensive to determine the right ones:

determine equidistant points in your data

get records from data that are closest to the selected points

point = startDate;
stepTimeSpan = (endDate - startDate) / (maxSteps - 1); //will fail if maxSteps = 1
for (i = 0; i < maxSteps; i++)
{
    records.Add(getClosestTo(point));
    point = point + stepTimeSpan;
}
return records;

Method 2: return records resulted from aggregations:

split the records in maxSteps buckets with records (by date)

obtain one record from each bucket as result of an aggregation

bucketStart = startDate;
bucketTimeSpan = (endDate - startDate) / maxSteps;
for (i = 0; i < maxSteps; i++)
{
   bucket = getRecordsBetween(bucketStart, bucketStart + bucketTimeSpan);
   records.Add( new Record( AvgDate(bucket), AvgValue(bucket) ) );
   bucketStart = bucketStart + bucketTimeSpan;
}
return records;

Call this method on client side each time the user changes the interval (using the small chart from the bottom in your example).

Play with maxSteps value until you find the right balance between performance and detail.

Large dataset visualisation

Answers (2)

Related Questions