Reputation: 2801
I have a large array of Dict
s that all look like this
{
"id": 12345,
"user_id": "6789",
"question_id": "some_question_id",
"correct": "true",
"actions": "...",
"consequentiality": 0,
"timestamp": 1505123456.000
}
and I need to sort them by (question_id, user_id, id, consequentiality)
with question_id
moving the slowest, and consequentiality
moving the faster -- kind of like grouping and subgrouping, if you will, but I need to perform swaps on an ordered array in certain cases, and most of these tend to happen across groups. I've been playing around with Base.sort
, passing different functions to by
and lt
. The best I came up with is combining multiple sorts one inside the other and passing the different keys to each by
clause, something like
sort(sort(sort(sort(df, by=x->x["question_id"]), by=x->x["user_id"] ...
you get the picture. So far I haven't been able to reach a satisfactory stable ordering, even when using a stable algorithm such as MergeSort
.
Help?
EDIT Would it make sense to use a tuple in the by
clause? How do I reverse the ordering of a non-numeric element, though?
Upvotes: 2
Views: 1084
Reputation: 18217
Here is a demonstration of a method to cleanup the solution using lt=...
in the comments. Note this works only with tuples, as it redefines isless
for tuples. If there is enthusiasm, maybe something like this can be incorporated into Julia or some package.
julia> struct RevNext
end
julia> import Base: isless
julia> function isless(t1::Tuple, t2::Tuple)
n1, n2 = length(t1), length(t2)
reverse = false
for i = 1:min(n1, n2)
a, b = t1[i], t2[i]
if !isequal(a, b)
return reverse ? isless(b, a) : isless(a,b)
else
reverse = isa(a,RevNext)
end
end
return n1 < n2
end
WARNING: Method definition isless(Tuple, Tuple) ...
isless (generic function with 53 methods)
And to use it, we write sort(M,by=x->(x[:b], RevNext(), x[:a]))
. An example for a randomly generated vector M
:
julia> M = [Dict(:a=>rand(),:b=>rand(Bool)) for i=1:10]
10-element Array{Dict{Symbol,Any},1}:
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.735352),Pair{Symbol,Any}(:b, true))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.537437),Pair{Symbol,Any}(:b, true))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.314947),Pair{Symbol,Any}(:b, true))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.9723),Pair{Symbol,Any}(:b, false))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.605042),Pair{Symbol,Any}(:b, true))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.256509),Pair{Symbol,Any}(:b, false))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.133487),Pair{Symbol,Any}(:b, false))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.320249),Pair{Symbol,Any}(:b, false))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.409549),Pair{Symbol,Any}(:b, true))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.421471),Pair{Symbol,Any}(:b, true))
julia> sort(M,by=x->(x[:b], RevNext(), x[:a]))
10-element Array{Dict{Symbol,Any},1}:
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.9723),Pair{Symbol,Any}(:b, false))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.320249),Pair{Symbol,Any}(:b, false))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.256509),Pair{Symbol,Any}(:b, false))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.133487),Pair{Symbol,Any}(:b, false))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.735352),Pair{Symbol,Any}(:b, true))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.605042),Pair{Symbol,Any}(:b, true))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.537437),Pair{Symbol,Any}(:b, true))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.421471),Pair{Symbol,Any}(:b, true))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.409549),Pair{Symbol,Any}(:b, true))
Dict{Symbol,Any}(Pair{Symbol,Any}(:a, 0.314947),Pair{Symbol,Any}(:b, true))
Upvotes: 2