Ami Tavory
Ami Tavory

Reputation: 76297

Dimensional Charting with Non-Exclusive Attributes

The following is a schematic, simplified, table, showing HTTP transactions. I'd like to build a DC analysis for it using dc, but some of the columns don't map well to crossfilter.

Schematic HTTP Transactions

In the settings of this question, all HTTP transactions have the fields time, host, requestHeaders, responseHeaders, and numBytes. However, different transactions have different specific HTTP request and response headers. In the table above, 0 and 1 represent the absence and presence, respectively, of a specific header in a specific transaction. The sub-columns of requestHeaders and responseHeaders represent the unions of the headers present in transactions. Different HTTP transaction datasets will almost surely generate different sub-columns.

For this question, a row in this chart is represented in code like this:

{
    "time": 0,
    "host": "a.com",
    "requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
    "responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 0},
    "numBytes": 12
}

The time, host, and numBytes all translate easily into crossfilter, and so it's possible to build charts answering things like what was the total number of bytes seen for transactions between 2 and 4 for host a.com. E.g.,

var ndx = crossfilter(data);
...
var hostDim = ndx.dimension(function(d) {
    return d.host;
});
var hostBytes = hostDim.group().reduceSum(function(d) {
    return d.numBytes;
});

The problem is that, for all slices of time and host, I'd like to show (capped) bar charts of the (leading) request and response headers by bytes. E.g. (see the first row), for time 0 and host a.com, the request headers bar chart should show that bar and baz each have 12.

There are two problems, a minor one and a major one.

Minor Problem

This doesn't fit quite naturally into dc, as it's one-directional. These bar charts should be updated for the other slices, but they can't be used for slicing themselves. E.g., you shouldn't be able to select bar and deselect baz, and look for a resulting breakdown of hosts by bytes, because what would this mean: hosts in the transactions that have bar but don't have baz? hosts in the the transactions that have bar and either do or don't have baz? It's too unintuitive.

How can I make some dc charts one directional. Is it through some hack of disabling mouse inputs?

Major Problem

As opposed to host, foo and bar are non-exclusive. Each transaction's host is either something or the other, but a transaction's headers might include any combination of foo and bar.

How can I define crossfilter dimensions for requestHeaders, then, and how can I use dc? That is

var ndx = crossfilter(data);
...
var requestHeadersDim = ndx.dimension(function(d) {
    // What should go here? 
});

Upvotes: 1

Views: 153

Answers (2)

Ami Tavory
Ami Tavory

Reputation: 76297

Hacked it (efficiently, but very inelegantly) by looking at the source code of dc. It's possible to distort the meaning of crossfilter to achieve the desired effect.

The final result is in this fiddle. It is slightly more limited than the question, as the fields of responseHeaders are hardcoded to foo, bar, and baz. Removing this restriction is more in the domain of simple Javascript.

Minor Problem

Using a simple css hack, I simply defined

.avoid-clicks {
    pointer-events: none;
}

and gave the div this class. Inelegant but effective.

Major Problem

The major problem is solved by distorting the meaning of crossfilter concepts, and "fooling" dc.

Let's say the data looks like this:

var transactions = [
  {
      "time": 0,
      "host": "a.com",
      "requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
      "responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 0},
      "numBytes": 12
  },
  {
      "time": 1,
      "host": "b.org",
      "requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
      "responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 1},
      "numBytes": 3
  },
  ...
];

We can define a "dummy" dimension, which ignores the data:

var transactionsNdx = crossfilter(transactions);

var dummyDim = transactionsNdx
  .dimension(function(d) {
    return 0;
  });

Using this dimension, we can define a group that counts the total foo, bar, and baz bytes of the filtered rows:

var requestHeadersGroup = dummyDim
  .group()
  .reduce(
    /* callback for when data is added to the current filter results */
    function (p, v) {
      return {
        "foo": p.foo + v.requestHeaders.foo * v.numBytes,
        "bar": p.bar + v.requestHeaders.bar * v.numBytes,
        "baz": p.baz + v.requestHeaders.baz * v.numBytes,
      }
    },
    /* callback for when data is removed from the current filter results */
    function (p, v) {
      return {
        "foo": p.foo - v.requestHeaders.foo * v.numBytes,
        "bar": p.bar - v.requestHeaders.bar * v.numBytes,
        "baz": p.baz - v.requestHeaders.baz * v.numBytes,
      }
    },
    /* initialize p */
    function () {
      return {
        "foo": 0,
        "bar": 0,
        "baz": 0
      }
    }
  );

Note that this isn't a proper crossfilter group at all. It will not map the dimensions to their values. Rather, it maps 0 to a value which itself maps the dimensions to their values (ugly!). We therefore need to transform this group into something that actually looks like a crossfilter group:

var getSortedFromGroup = function() {
  var all = requestHeadersGroup.all()[0].value;
  all = [
    {
      "key": "foo",
      "value": all.foo
    },
    {
      "key": "bar",
      "value": all.bar
    },
    {
      "key": "foo",
      "value": all.baz
    }];
  return all.sort(function(lhs, rhs) {
    return lhs.value - rhs.value;
  });
}
var requestHeadersDisplayGroup = {
  "top": function(k) {
      return getSortedFromGroup();
    },
  "all": function() {
      return getSortedFromGroup();
    },
};

We now can create a regular dc chart, and pass the adaptor group requestHeadersDisplayGroup to it. It works normally from this point on.

Upvotes: 1

Ethan Jewett
Ethan Jewett

Reputation: 6010

The way I usually deal with the major problem you state is to transform my data so that there is a separate record for each header (all other fields in these duplicate records are the same). Then I use custom group aggregations to avoid double-counting. These custom aggregations are a bit hard to manage so I built Reductio to help with this using the 'exception' function - github.com/esjewett/reductio

Upvotes: 1

Related Questions