nrabinowitz
nrabinowitz

Reputation: 55678

Sort by multiple dimensions in crossfilter.js

I'm using Mike Bostock's crossfilter library to filter and sort large datasets. My problem: Given a dataset with multiple dimensions, how can I sort on more than one dimension at a time?

Example JSFiddle

Test dataset:

[
    { cat: "A", val:1 },
    { cat: "B", val:2 },
    { cat: "A", val:11 },
    { cat: "B", val:5 },
    { cat: "A", val:3 },
    { cat: "B", val:2 },
    { cat: "A", val:11 },
    { cat: "B", val:100 }
]

Example of desired output, sorting by cat, val (ascending):

[
    { cat: "A", val:1 },
    { cat: "A", val:3 },
    { cat: "A", val:11 },
    { cat: "A", val:11 },
    { cat: "B", val:2 },
    { cat: "B", val:2 },
    { cat: "B", val:5 },
    { cat: "B", val:100 }
]

The approach I've used thus far is to use string concatenation on the desired dimensions:

var combos = cf.dimension(function(d) { return d.cat + '|' + d.val; });

This works fine with multiple string-based dimensions, but won't work with numeric dimensions, as it's not a natural sort ('4' > '11' ). I think I could make this work with zero-padding on the numbers, but this could get expensive for a large dataset, so I'd prefer to avoid it. Is there another way that might work here, using crossfilter?

Bonus points for any solution that allows different dimensions to have different sort directions (ascending/descending).

Clarification: Yes, I may need to switch to a native Array.sort implementation. But the whole point of using crossfilter is that it's very, very fast, especially for large datasets, and it caches sort order in a way that makes repeated sorts even faster. So I'm really looking for a crossfilter-based answer here.

Upvotes: 4

Views: 4052

Answers (4)

Renaud
Renaud

Reputation: 4668

I haven't tested for preformance but you could give d3.nest a go. Example code:

var nested = d3.nest()
.key(function(d) { return d.cat; })
.sortKeys(d3.ascending)
.sortValues(compareValues)
.entries(data);

See the whole fiddle here: http://jsfiddle.net/RFontana/bZX7Q/

And let me know what result you get if you run some jsperf :)

Upvotes: 0

nrabinowitz
nrabinowitz

Reputation: 55678

Here's what I ended up doing:

  • I still use string concatenation on a single new dimension, but
  • I convert the measure to a positive, comparable decimal before turning it into a string, using crossfilter to get the min/max:

    var vals = cf.dimension(function(d) { return d.val }),
        min = vals.bottom(1)[0].val,
        offset =  min < 0 ? Math.abs(min) : 0,
        max = vals.top(1)[0].val + offset,
        valAccessor = function(d) {
            // offset ensures positive numbers, fraction ensures sort order
            return ((d.val + offset) / max).toFixed(8);
        },
        combos = cf.dimension(function(d) { 
            return d.cat + '|' + valAccessor(d); 
        });
    

See working fiddle: http://jsfiddle.net/nrabinowitz/cQXNK/9/

This has the advantage of handling negative numbers properly - not possible with zero-padding, as far as I can tell. It seems to be just as fast. The downside is that it requires creating a new dimension on the numeric column, but in my case I usually require that in any case.

Upvotes: 1

Gabriel Gartz
Gabriel Gartz

Reputation: 2870

Using the Array.prototype.sort, you can:

function sortByPriority(a, b) {
    var p = sortByPriority.properties;
    function pad (str, max) {
        str = String(str);
        return str.length < max ? pad("0" + str, max) : str;
    }

    if (!p) {
        return a - b;
    }
    var ar ='', br = '';
    for (var i = 0, max = p.length; i < max; i++) {
        ar += pad(a[p[i]], 10);
        br += pad(b[p[i]], 10);
    }
    return ar == br ? 0 : ar > br ? 1 : - 1;
}

How to use:

Sorting cat then val

sortByPriority.properties = ['cat', 'val'];
myArray.sort(sortByPriority);

Result:

  • A 1
  • A 3
  • A 11
  • A 11
  • B 2
  • B 2
  • B 5
  • B 100

if you want prior val do:

sortByPriority.properties = ['val', 'cat'];
myArray.sort(sortByPriority);

Result:

  • A 1
  • B 2
  • B 2
  • A 3
  • B 5
  • A 11
  • A 11
  • B 100

Not a super effective code but, you can improve it.

UPDATE:

You can use the pad function to get same results using crossfilter, look this jsfiddle.

var combos = cf.dimension(function(d) { 
    return pad(d.cat, 10) + '|' + pad(d.val, 10); 
});

You also can change the pad size by the same length from the biggest string in your "coll", this will ensure the result ever.

See that optimization: http://jsfiddle.net/gartz/cQXNK/7/

Upvotes: 1

Justin Bicknell
Justin Bicknell

Reputation: 4808

I know it's not using the crossfilter library, but why not use the sort function to do this?

var combos = cf.sort(function(a,b) { 
   if(a.cat == b.cat) return a.val < b.val ? -1 : 1;
   return a.cat < b.cat ? -1 : 1;
});

see http://jsfiddle.net/cQXNK/5/

To allow different dimensions to have different sort directions would just be a matter of swapping -1 for 1 and vice versa

Upvotes: 2

Related Questions