Morpheu5
Morpheu5

Reputation: 2801

Sort array of Dicts by multiple keys, in Julia

I have a large array of Dicts that all look like this

{
  "id": 12345,
  "user_id": "6789",
  "question_id": "some_question_id",
  "correct": "true",
  "actions": "...",
  "consequentiality": 0,
  "timestamp": 1505123456.000
}

and I need to sort them by (question_id, user_id, id, consequentiality) with question_id moving the slowest, and consequentiality moving the faster -- kind of like grouping and subgrouping, if you will, but I need to perform swaps on an ordered array in certain cases, and most of these tend to happen across groups. I've been playing around with Base.sort, passing different functions to by and lt. The best I came up with is combining multiple sorts one inside the other and passing the different keys to each by clause, something like

sort(sort(sort(sort(df, by=x->x["question_id"]), by=x->x["user_id"] ...

you get the picture. So far I haven't been able to reach a satisfactory stable ordering, even when using a stable algorithm such as MergeSort.

Help?

EDIT Would it make sense to use a tuple in the by clause? How do I reverse the ordering of a non-numeric element, though?

Upvotes: 2

Views: 1084

Answers (1)

Dan Getz
Dan Getz

Reputation: 18217

Here is a demonstration of a method to cleanup the solution using lt=... in the comments. Note this works only with tuples, as it redefines isless for tuples. If there is enthusiasm, maybe something like this can be incorporated into Julia or some package.

julia> struct RevNext
       end

julia> import Base: isless

julia> function isless(t1::Tuple, t2::Tuple)
           n1, n2 = length(t1), length(t2)
           reverse = false
           for i = 1:min(n1, n2)
               a, b = t1[i], t2[i]
               if !isequal(a, b)
                   return reverse ? isless(b, a) : isless(a,b)
               else
                   reverse = isa(a,RevNext)
               end
           end
           return n1 < n2
       end
WARNING: Method definition isless(Tuple, Tuple) ...
isless (generic function with 53 methods)

And to use it, we write sort(M,by=x->(x[:b], RevNext(), x[:a])). An example for a randomly generated vector M:

julia> M = [Dict(:a=>rand(),:b=>rand(Bool)) for i=1:10]
10-element Array{Dict{Symbol,Any},1}:
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.735352),Pair{Symbol,Any}(:b, true)) 
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.537437),Pair{Symbol,Any}(:b, true)) 
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.314947),Pair{Symbol,Any}(:b, true)) 
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.9723),Pair{Symbol,Any}(:b, false))  
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.605042),Pair{Symbol,Any}(:b, true)) 
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.256509),Pair{Symbol,Any}(:b, false))
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.133487),Pair{Symbol,Any}(:b, false))
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.320249),Pair{Symbol,Any}(:b, false))
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.409549),Pair{Symbol,Any}(:b, true)) 
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.421471),Pair{Symbol,Any}(:b, true)) 

julia> sort(M,by=x->(x[:b], RevNext(), x[:a]))
10-element Array{Dict{Symbol,Any},1}:
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.9723),Pair{Symbol,Any}(:b, false))  
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.320249),Pair{Symbol,Any}(:b, false))
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.256509),Pair{Symbol,Any}(:b, false))
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.133487),Pair{Symbol,Any}(:b, false))
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.735352),Pair{Symbol,Any}(:b, true)) 
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.605042),Pair{Symbol,Any}(:b, true)) 
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.537437),Pair{Symbol,Any}(:b, true)) 
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.421471),Pair{Symbol,Any}(:b, true)) 
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.409549),Pair{Symbol,Any}(:b, true)) 
 Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.314947),Pair{Symbol,Any}(:b, true)) 

Upvotes: 2

Related Questions