Cos
Cos

Reputation: 1709

What are best practices for JSON data aggregation?

Let's say I have a dataset in the following format:

var smallTestData = [
{"YEAR": "2009", "MONTH": "1", "CUSTOMER": "Customer1", "REVENUE": "1938.49488391425"},
{"YEAR": "2009", "MONTH": "1", "CUSTOMER": "Customer2", "REVENUE": "75.9142774343491"},
{"YEAR": "2009", "MONTH": "1", "CUSTOMER": "Customer2", "REVENUE": "99.3456067931875"} ...];

Now, if I wanted to draw a D3 or C3 chart in which to display the cummulated REVENUE per YEAR and MONTH for each CUSTOMER, I reckon I would have to end up with something like this:

   [{"yearMonth":"2009 1","revenueCustomer1":158989,"revenueCustomer2":68181},
    {"yearMonth":"2009 2","revenueCustomer1":171217,"revenueCustomer2":204975},
    {"yearMonth":"2009 3","revenueCustomer1":38477,"revenueCustomer2":46605} ...];

Sure, this doesn't look very elegant, but that's nothing. The worst part is the aggregation of the measures (ex. REVENUE) based on multiple dimensions (ex. YEAR, MONTH, CUSTOMER), which is a pain to do for JSON data.

I have tried writing my own aggregation functions to cover this problem, but couldn't find any satisfactory solution except manually adding the values together. Can anyone guide me in the right direction? How would you go about aggregating your data to fit the kind of chart I described? Is there any readily available solution?

And since we're on the topic, how would you go about joining two or more datasets based on primary keys?

Thank you!

Upvotes: 1

Views: 4728

Answers (2)

meetamit
meetamit

Reputation: 25157

var smallTestData = [
  {"YEAR": "2009", "MONTH": "1", "CUSTOMER": "Customer1", "REVENUE": "1938.49488391425"},
  {"YEAR": "2009", "MONTH": "1", "CUSTOMER": "Customer2", "REVENUE": "75.9142774343491"},
  {"YEAR": "2009", "MONTH": "1", "CUSTOMER": "Customer2", "REVENUE": "99.3456067931875"},
  {"YEAR": "2009", "MONTH": "2", "CUSTOMER": "Customer1", "REVENUE": "1938.49488391425"},
  {"YEAR": "2009", "MONTH": "2", "CUSTOMER": "Customer2", "REVENUE": "75.9142774343491"},
  {"YEAR": "2009", "MONTH": "2", "CUSTOMER": "Customer2", "REVENUE": "99.3456067931875"},
  {"YEAR": "2008", "MONTH": "1", "CUSTOMER": "Customer1", "REVENUE": "1938.49488391425"},
  {"YEAR": "2008", "MONTH": "1", "CUSTOMER": "Customer1", "REVENUE": "75.9142774343491"},
  {"YEAR": "2008", "MONTH": "1", "CUSTOMER": "Customer2", "REVENUE": "99.3456067931875"},
  {"YEAR": "2008", "MONTH": "2", "CUSTOMER": "Customer1", "REVENUE": "1938.49488391425"},
  {"YEAR": "2008", "MONTH": "2", "CUSTOMER": "Customer1", "REVENUE": "75.9142774343491"},
  {"YEAR": "2008", "MONTH": "2", "CUSTOMER": "Customer2", "REVENUE": "99.3456067931875"},
  {"YEAR": "2007", "MONTH": "1", "CUSTOMER": "Customer1", "REVENUE": "1938.49488391425"},
  {"YEAR": "2007", "MONTH": "1", "CUSTOMER": "Customer2", "REVENUE": "75.9142774343491"},
  {"YEAR": "2007", "MONTH": "1", "CUSTOMER": "Customer2", "REVENUE": "99.3456067931875"}
];

var nested = d3.nest()
  .key(function(d) { return d.CUSTOMER; }) // nest first by customer
  .key(function(d) { return d.YEAR; }) // then-by year
  .key(function(d) { return d.MONTH; }) // then-by month
  .rollup(function(values) {
    return d3.sum(values, function(d) { return d.REVENUE; });
  })
  .map(smallTestData)

console.log(nested);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.4.11/d3.min.js"></script>

Run this snippet, open the console window and see the logged result. If instead of .map(nested) you use .entries(nested) you get the same thing but represented as a series of nested arrays, which may make it more convenient to bind to d3 selections (using their .data() method).

Varying the order of the .key() functions controls the order of nesting.

Upvotes: 3

Ben Lyall
Ben Lyall

Reputation: 1976

This seems like the perfect time to bust out d3.nest(). Have a look at https://github.com/mbostock/d3/wiki/Arrays#-nest to learn more.

For your data and what you're looking to do, you can do the following:

var nestedData = d3.nest().key(function(d) { return d.YEAR + " " + d.MONTH; })
                          .key(function(d) { return d.CUSTOMER; })
                          .rollup(function(leaves) {
                              return d3.sum(leaves, function(d) {
                                  return +d.REVENUE;
                              });
                           })
                           .entries(smallTestData);

This will return an object similar to:

[ 
    { 
        key: "2009 1",
        values: [
            { 
              key: "Customer1"
              values: 1938.49488391425
            },
            { 
              key: "Customer2"
              values: 175.2598842275366
            }
      },
      { ... }
]

The way d3.nest works is the key functions define the object properties that you wish to aggregate by, and the rollup function is used to summarise all the data that matches the key functions. Each key function will create a new depth of nesting. If you don't have the rollup function, then the values for each key will just be an array of all values in your data that match the values defined in the key functions.

Upvotes: 6

Related Questions