Reputation: 76297
The following is a schematic, simplified, table, showing HTTP transactions. I'd like to build a DC analysis for it using dc
, but some of the columns don't map well to crossfilter
.
In the settings of this question, all HTTP transactions have the fields time
, host
, requestHeaders
, responseHeaders
, and numBytes
. However, different transactions have different specific HTTP request and response headers. In the table above, 0 and 1 represent the absence and presence, respectively, of a specific header in a specific transaction. The sub-columns of requestHeaders
and responseHeaders
represent the unions of the headers present in transactions. Different HTTP transaction datasets will almost surely generate different sub-columns.
For this question, a row in this chart is represented in code like this:
{
"time": 0,
"host": "a.com",
"requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
"responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 0},
"numBytes": 12
}
The time
, host
, and numBytes
all translate easily into crossfilter
, and so it's possible to build charts answering things like what was the total number of bytes seen for transactions between 2 and 4 for host a.com
. E.g.,
var ndx = crossfilter(data);
...
var hostDim = ndx.dimension(function(d) {
return d.host;
});
var hostBytes = hostDim.group().reduceSum(function(d) {
return d.numBytes;
});
The problem is that, for all slices of time
and host
, I'd like to show (capped) bar charts of the (leading) request and response headers by bytes. E.g. (see the first row), for time 0 and host a.com
, the request headers bar chart should show that bar
and baz
each have 12.
There are two problems, a minor one and a major one.
Minor Problem
This doesn't fit quite naturally into dc
, as it's one-directional. These bar charts should be updated for the other slices, but they can't be used for slicing themselves. E.g., you shouldn't be able to select bar
and deselect baz
, and look for a resulting breakdown of hosts by bytes, because what would this mean: hosts in the transactions that have bar
but don't have baz
? hosts in the the transactions that have bar
and either do or don't have baz
? It's too unintuitive.
How can I make some dc
charts one directional. Is it through some hack of disabling mouse inputs?
Major Problem
As opposed to host
, foo
and bar
are non-exclusive. Each transaction's host is either something or the other, but a transaction's headers might include any combination of foo
and bar
.
How can I define crossfilter dimensions for requestHeaders
, then, and how can I use dc
? That is
var ndx = crossfilter(data);
...
var requestHeadersDim = ndx.dimension(function(d) {
// What should go here?
});
Upvotes: 1
Views: 153
Reputation: 76297
Hacked it (efficiently, but very inelegantly) by looking at the source code of dc
. It's possible to distort the meaning of crossfilter
to achieve the desired effect.
The final result is in this fiddle. It is slightly more limited than the question, as the fields of responseHeaders
are hardcoded to foo
, bar
, and baz
. Removing this restriction is more in the domain of simple Javascript.
Minor Problem
Using a simple css hack, I simply defined
.avoid-clicks {
pointer-events: none;
}
and gave the div this class. Inelegant but effective.
Major Problem
The major problem is solved by distorting the meaning of crossfilter
concepts, and "fooling" dc
.
Let's say the data looks like this:
var transactions = [
{
"time": 0,
"host": "a.com",
"requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
"responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 0},
"numBytes": 12
},
{
"time": 1,
"host": "b.org",
"requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
"responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 1},
"numBytes": 3
},
...
];
We can define a "dummy" dimension, which ignores the data:
var transactionsNdx = crossfilter(transactions);
var dummyDim = transactionsNdx
.dimension(function(d) {
return 0;
});
Using this dimension, we can define a group that counts the total foo
, bar
, and baz
bytes of the filtered rows:
var requestHeadersGroup = dummyDim
.group()
.reduce(
/* callback for when data is added to the current filter results */
function (p, v) {
return {
"foo": p.foo + v.requestHeaders.foo * v.numBytes,
"bar": p.bar + v.requestHeaders.bar * v.numBytes,
"baz": p.baz + v.requestHeaders.baz * v.numBytes,
}
},
/* callback for when data is removed from the current filter results */
function (p, v) {
return {
"foo": p.foo - v.requestHeaders.foo * v.numBytes,
"bar": p.bar - v.requestHeaders.bar * v.numBytes,
"baz": p.baz - v.requestHeaders.baz * v.numBytes,
}
},
/* initialize p */
function () {
return {
"foo": 0,
"bar": 0,
"baz": 0
}
}
);
Note that this isn't a proper crossfilter
group at all. It will not map the dimensions to their values. Rather, it maps 0 to a value which itself maps the dimensions to their values (ugly!). We therefore need to transform this group into something that actually looks like a crossfilter
group:
var getSortedFromGroup = function() {
var all = requestHeadersGroup.all()[0].value;
all = [
{
"key": "foo",
"value": all.foo
},
{
"key": "bar",
"value": all.bar
},
{
"key": "foo",
"value": all.baz
}];
return all.sort(function(lhs, rhs) {
return lhs.value - rhs.value;
});
}
var requestHeadersDisplayGroup = {
"top": function(k) {
return getSortedFromGroup();
},
"all": function() {
return getSortedFromGroup();
},
};
We now can create a regular dc
chart, and pass the adaptor group
requestHeadersDisplayGroup
to it. It works normally from this point on.
Upvotes: 1
Reputation: 6010
The way I usually deal with the major problem you state is to transform my data so that there is a separate record for each header (all other fields in these duplicate records are the same). Then I use custom group aggregations to avoid double-counting. These custom aggregations are a bit hard to manage so I built Reductio to help with this using the 'exception' function - github.com/esjewett/reductio
Upvotes: 1