Rami
Rami

Reputation: 8314

Monary: Aggregation framework, set the allowDiskUse option

I am using Monary to connect to my MongoDB. but I am struggling to figure out where exactly and how to set the allowDiskUse option?

client = Monary("ip.address.of.server", 27017 , "username", "password", "dbname")

pipeline = [
        {"$group" : {
            "_id" : {"user":"$subscriber_id",
                 "month": { "$month" : "$timestamp" },
                 "day" : { "$dayOfMonth" : "$timestamp" },
                 "year" : { "$year" : "$timestamp" },
                 "hour" : { "$hour" : "$timestamp" },
                 "category":"$category_name"
                },
            "activities_sum":{"$sum":"$activity_count"}
            }
        }
    ]

with client as m:
    users, years, months, days, hours, categories, activities  = m.aggregate("digicel_exploration",
                "5_min_slots",
                time_aggregation_pipeline,
                ["_id.user", "_id.year", "_id.month", "_id.day", "_id.hour", "_id.category", "activities_sum"],
                ["string:30", "int32", "int32", "int32", "int32", "string:60", "int32"])

Upvotes: 1

Views: 446

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50436

Monary uses the mongoc driver underneath and directly in a way that does not abstract the pymongo driver, which is really the official source that is under MongoDB company maintenance.

As such the implementation has been done in a way that does not allow the necessary "options" to be passed into the aggregate() method for things such as "allowDiskUse".

You can see the implementation code here, paying attention to the forth and fifth arguments which are hard coded NULL:

// Get an aggregation cursor
mcursor = mongoc_collection_aggregate(collection,
                                      MONGOC_QUERY_NONE,
                                      &pl_bson, NULL, NULL);

When you compare this to the doccumented signature for mongoc_collection_aggregate, then the problem becomes clear:

mongoc_cursor_t *
mongoc_collection_aggregate (mongoc_collection_t       *collection,
                             mongoc_query_flags_t       flags,
                             const bson_t              *pipeline,
                             const bson_t              *options,
                             const mongoc_read_prefs_t *read_prefs)
   BSON_GNUC_WARN_UNUSED_RESULT;

If you need this option in your processing, then you would be better off using pymongo directly and loading up your NumPy arrays manually based on the results.

Alternately, you could take the approach as has already been mentioned in a reported issue on the subject, and patch up the source yourself if you are prepared to build yourself:

bson_t opts;
bson_init(&opts);
BSON_APPEND_BOOL (&opts, "allowDiskUse", true);
mcursor = mongoc_collection_aggregate(collection,
                                      MONGOC_QUERY_NONE,
                                      &pl_bson, &opts, NULL);
bson_destroy(&opts);

Or even provide a full patch yourself that adds the options signature to the method definition and passes them through correctly.

Upvotes: 1

Related Questions