Reputation: 2151
I'm trying to sort a collection of documents based on a field which is a field of a subdocument's subdocument.
Here's a very simplified version of my documents:
{
"_id": ObjectId("536900cdb4f805efff8b075b"),
"name": "el1",
"versions": [{
"releases": [{
"rd": ISODate("2064-05-05T15:36:10.098Z")
}, {
"rd": ISODate("2014-05-01T16:00:00Z")
}]
}, {
"releases": [{
"rd": ISODate("2064-05-04T15:36:10.098Z")
}, {
"rd": ISODate("2014-05-01T14:00:00Z")
}]
}]
}, {
"_id": ObjectId("536900f2b4f805efff8b075c"),
"name": "el2",
"versions": [{
"releases": [{
"rd": ISODate("2064-05-05T15:36:10.098Z")
}, {
"rd": ISODate("2014-05-01T17:00:00Z")
}]
}]
}
As you can see each document might have subdocuments named version
and each version
might have multiple subdocuments named release
. I'd like to sort the main documents based on the rd
field while excluding from the sort
calculation all dates which are greater than a year from now. I don't care about sorting the subdocuments within the main document.
i.e. ISODate("2064-05-05T15:36:10.098Z")
should be ignored because too far away while ISODate("2014-05-01T16:00:00Z")
is good. By "ignore" I mean: don't use that value in the sort calculation, and not: remove that document from the result.
I have tried several ways including map-reduce
and aggregation framework
but failed miserably.
This should be the output of a successful sort:
{
"_id": ObjectId("536900f2b4f805efff8b075c"),
"name": "el2",
"versions": [{
"releases": [{
"rd": ISODate("2064-05-05T15:36:10.098Z")
}, {
"rd": ISODate("2014-05-01T17:00:00Z")
}]
}]
}, {
"_id": ObjectId("536900cdb4f805efff8b075b"),
"name": "el1",
"versions": [{
"releases": [{
"rd": ISODate("2064-05-05T15:36:10.098Z")
}, {
"rd": ISODate("2014-05-01T16:00:00Z")
}]
}, {
"releases": [{
"rd": ISODate("2064-05-04T15:36:10.098Z")
}, {
"rd": ISODate("2014-05-01T14:00:00Z")
}]
}]
}
Upvotes: 0
Views: 251
Reputation: 3402
Please find two solutions to your question in the test case below. The first solution uses the MongoDB aggregation framework. For each document, sort keys are projected out of the rd values based on your time limit. The nested sort-key structure is reduced by unwinding twice followed by grouping for the max sort key. After sorting the docs, the last 'project' stage removes the sort keys. The second solution sorts on the client side. For efficiency, it processes each doc to determine the sort key and merges it in. After sorting the docs, it deletes the sort key out of each doc. You can eliminate the removal of the sort keys if their presence is tolerable.
A major strength of MongoDB is that documents map well to programming language data structures. So I would recommend using Ruby for a first attempt at a solution before looking for a DB solution. Note that in the Ruby solution, the rd_sort_key method while straightforward is non-trivial, suggesting that what you attempting with a condition and nested arrays is fairly complex, even without trying to do it in MongoDB's aggregation framework.
The client-side solution is OK if you are fetching the whole result set without a limit. If you use a limit, the server-side solution may save you some transfer time. But as always, you should measure and compare.
I hope that this helps, and that it is interesting and perhaps illuminating.
test.rb
require 'mongo'
require 'date'
require 'test/unit'
def iso_date_to_time(s)
DateTime.parse(s).to_time
end
class MyTest < Test::Unit::TestCase
def setup
@pipeline = [
{'$project' => {
'name' => '$name',
'versions' => '$versions',
'rd_sort_key' => {
'$map' => {
'input' => '$versions', 'as' => 'version', 'in' => {
'$map' => {
'input' => '$$version.releases', 'as' => 'release', 'in' => {
'$cond' => [
{'$lt' => ['$$release.rd', @year_from_now]},
'$$release.rd',
nil
]}}}}}}},
{'$unwind' => '$rd_sort_key'},
{'$unwind' => '$rd_sort_key'},
{'$group' => {
'_id' => '$_id',
'name' => {'$first' => '$name'},
'versions' => {'$first' => '$versions'},
'rd_sort_key' => {'$max' => '$rd_sort_key'}}},
{'$sort' => {'rd_sort_key' => -1}},
{'$project' => {
'_id' => '$_id',
'name' => '$name',
'versions' => '$versions'}}
]
@coll = Mongo::MongoClient.new['test']['events_h']
@docs = [
{"_id" => BSON::ObjectId("536900cdb4f805efff8b075b"),
"name" => "el1",
"versions" => [{"releases" => [{"rd" => iso_date_to_time("2064-05-05T15:36:10.098Z")},
{"rd" => iso_date_to_time("2014-05-01T16:00:00Z")}]},
{"releases" => [{"rd" => iso_date_to_time("2064-05-04T15:36:10.098Z")},
{"rd" => iso_date_to_time("2014-05-01T14:00:00Z")}]}]
},
{"_id" => BSON::ObjectId("536900f2b4f805efff8b075c"),
"name" => "el2",
"versions" => [{"releases" => [{"rd" => iso_date_to_time("2064-05-05T15:36:10.098Z")},
{"rd" => iso_date_to_time("2014-05-01T17:00:00Z")}]}]
}]
@expected_names = [@docs.last['name'], @docs.first['name']]
@coll.remove
@coll.insert(@docs)
@year_from_now = Time.now + 60*60*24*365
end
test "aggregation sort with map and conditional" do
result = @coll.aggregate(@pipeline)
assert_equal(@expected_names, result.collect{|doc| doc['name']})
end
def rd_sort_key(doc, future_time_limit)
sort_key = nil
doc['versions'].each do |version|
version['releases'].each do |release|
rd = release['rd']
sort_key = sort_key ? [sort_key, rd].max : rd if rd < future_time_limit
end
end
sort_key
end
test "client sort with conditional" do
result = @coll.find.to_a
result.each{|doc| doc['rd_sort_key'] = rd_sort_key(doc, @year_from_now)}
result = result.sort{|a, b| b['rd_sort_key'] ? b['rd_sort_key'] <=> a['rd_sort_key'] : -1}
result.each{|doc| doc.delete('rd_sort_key')}
assert_equal(@expected_names, result.collect{|doc| doc['name']})
end
end
$ ruby test.rb
Loaded suite test
Started
..
Finished in 0.008794 seconds.
2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
227.43 tests/s, 227.43 assertions/s
Upvotes: 1