Reputation: 341
My goal is to return multiple questionElements where the questionElements metaTag entry equals my search. E.G. if a metaTag element equals my string the return it's parent questionEntry element and search across all elements nested in show.
So what I want is the to match the document that contains the required "metaTags" value, AND "filter" any sub document arrays that do not contain this inner match
This is what I haved tried as an aggregation query with $redact
, but it does not give the results that I want:
db.mongoColl.aggregate([{"$redact":{"$cond": { if: {$gt:[ {"$size": {
$setIntersection : [ { "$ifNull": [ "$metaTags", []]},
["MySearchString"]]} } , 0 ]} , then:"$$PRUNE",
else:"$$DESCEND" }}}]).pretty();
My instancess are:
private DB mongoDatabase;
private DBCollection mongoColl;
private DBObject dbObject;
// Singleton class
// Create client (server address(host,port), credential, options)
mongoClient = new MongoClient(new ServerAddress(host, port),
Collections.singletonList(credential),
options);
mongoDatabase = ClientSingleton.getInstance().getClient().getDB("MyDB");
My document in the database to match is:
{
"show":[
{
"season":[
{
"episodes":[
{
"questionEntry":{
"id":1,
"info":{
"seasonNumber":1,
"episodeNumber":5,
"episodeName":"A Hero Sits Next Door"
},
"questionItem":{
"theQuestion":"What is the name of the ringer hired by Mr. Weed?",
"attachedElement":{
"type":1,
"value":""
}
},
"options":[
{
"type":1,
"value":"Johnson"
},
{
"type":1,
"value":"Hideo"
},
{
"type":1,
"value":"Guillermo"
}
],
"answer":{
"questionId":1,
"answer":3
},
"metaTags":[
"Season 1",
"Episode 5",
"Trivia",
"Arya Stark",
"House Stark"
]
}
}
]
}
]
}
]
}
However, if any arrays within the document do not contain the "metaTags" value to be matched i.e "Arya Stark", then I do not want any elements of that array to be matched at all in the result. The "metaTags" can stay as it is.
I am running the latest java driver and using java SE1.7 compiler in Eclipse if that makes any difference to the response.
Upvotes: 2
Views: 3639
Reputation: 50406
The $redact
operator was really not the best choice here, or is the logic that simple, being a major contributor to why the attempted query does not work. The "redaction" options are pretty much an "all or nothing" process over a singular specific condition, and that condition can be used to $$DESCEND
, and therefore traverse the levels of the document.
At best you get a lot of "false positives" by transposing a values where the field in the codition does not exist. At worst, you end up removing the entire document, contrary to that it can be a match. It has it's uses, but this is not really one of them.
First a simplified sample based on your structure. This is mainly to be able to visualize the things we want to filter from the content:
{
"show": [
{
"name": "Game of Thrones",
"season": [
{
"_id": 1,
"episodes": [
{
"_id": 1,
"metaTags": [
"Arya Stark"
]
},
{
"_id": 2,
"metaTags": [
"John Snow"
]
}
]
},
{
"_id": 2,
"episodes": [
{
"_id": 1,
"metaTags": [
"Arya Stark"
]
}
]
}
]
},
{
"name": "Seinfeld",
"season": [
{
"_id": 1,
"episodes": [
{
"_id": 1,
"metaTags": [
"Jerry Seinfeld"
]
}
]
}
]
}
]
}
There are two approaches to getting the results here. First there is a traditional approach using $unwind
in order to work with the arrays, which are then filtered using $match
and conditional expressions with of course serveral stages of $group
operations in order to reconstruct the arrays:
db.sample.aggregate([
{ "$match": {
"show.season.episodes.metaTags": "Arya Stark"
}},
{ "$unwind": "$show" },
{ "$unwind": "$show.season" },
{ "$unwind": "$show.season.episodes" },
{ "$unwind": "$show.season.episodes.metaTags" },
{ "$group": {
"_id": {
"_id": "$_id",
"show": {
"name": "$show.name",
"season": {
"_id": "$show.season._id",
"episodes": {
"_id": "$show.season.episodes._id",
}
}
}
},
"metaTags": { "$push": "$show.season.episodes.metaTags" },
"matched": {
"$sum": {
"$cond": [
{ "$eq": [ "$show.season.episodes.metaTags", "Arya Stark" ] },
1,
0
]
}
}
}},
{ "$sort": { "_id._id": 1, "_id.show.season.episodes._id": 1 } },
{ "$group": {
"_id": {
"_id": "$_id._id",
"show": {
"name": "$_id.show.name",
"season": {
"_id": "$_id.show.season._id",
},
}
},
"episodes": {
"$push": {
"$cond": [
{ "$gt": [ "$matched", 0 ] },
{
"_id": "$_id.show.season.episodes._id",
"metaTags": "$metaTags"
},
false
]
}
}
}},
{ "$unwind": "$episodes" },
{ "$match": { "episodes": { "$ne": false } } },
{ "$group": {
"_id": "$_id",
"episodes": { "$push": "$episodes" }
}},
{ "$sort": { "_id._id": 1, "_id.show.season._id": 1 } },
{ "$group": {
"_id": {
"_id": "$_id._id",
"show": {
"name": "$_id.show.name"
}
},
"season": {
"$push": {
"_id": "$_id.show.season._id",
"episodes": "$episodes"
}
}
}},
{ "$group": {
"_id": "$_id._id",
"show": {
"$push": {
"name": "$_id.show.name",
"season": "$season"
}
}
}}
])
That is all well and fine and is fairly easy to understand. however the processs of using $unwind
here creates a lot of overhead, particularly when we are just talking about filtering within the document itself, and not doing any grouping across documents.
There is a Modern approach to this, but be warned that while efficient, it is an absolute "monster" and very easy to get lost in the logic when dealing with embedded arrays:
db.sample.aggregate([
{ "$match": {
"show.season.episodes.metaTags": "Arya Stark"
}},
{ "$project": {
"show": {
"$setDifference": [
{ "$map": {
"input": "$show",
"as": "show",
"in": {
"$let": {
"vars": {
"season": {
"$setDifference": [
{ "$map": {
"input": "$$show.season",
"as": "season",
"in": {
"$let": {
"vars": {
"episodes": {
"$setDifference": [
{ "$map": {
"input": "$$season.episodes",
"as": "episode",
"in": {
"$cond": [
{ "$setIsSubset": [
"$$episode.metaTags",
["Arya Stark"]
]},
"$$episode",
false
]
}
}},
[false]
]
}
},
"in": {
"$cond": [
{ "$ne": [ "$$episodes", [] ] },
{
"_id": "$$season._id",
"episodes": "$$episodes"
},
false
]
}
}
}
}},
[false]
]
}
},
"in": {
"$cond": [
{ "$ne": ["$$season", [] ] },
{
"name": "$$show.name",
"season": "$$season"
},
false
]
}
}
}
}},
[false]
]
}
}}
])
There is lots of array processing in there with $map
and each level as well as variable declarations with $let
for each array, since we are both "filtering" content via $setDifference
and testing for empty arrays.
With a single pipeline $project
after the inital query match, this is much faster than the previous process.
Both produce the same filtered result:
{
"_id" : ObjectId("55b3455e64518e494632fa16"),
"show" : [
{
"name" : "Game of Thrones",
"season" : [
{
"_id" : 1,
"episodes" : [
{
"_id" : 1,
"metaTags" : [
"Arya Stark"
]
}
]
},
{
"_id" : 2,
"episodes" : [
{
"_id" : 1,
"metaTags" : [
"Arya Stark"
]
}
]
}
]
}
]
}
With all "show", "season" and "episodes" arrays completely filtered of any documents that did not match the inner "metaTags" condition. The "metaTags" array itself is untouched, and is only tested for a match via $setIsSubset
, and really only then in order to filter the "episodes" array content where it was not a match.
Converting this for usage is the Java driver is a fairly straighforward process, as this is just a data structure repsentation of Objects and lists. In the same wat you just build the same structures in Java using standard lists and BSON Document objects. But it's basically all list and map syntax:
MongoDatabase db = mongoClient.getDatabase("test");
MongoCollection<Document> collection = db.getCollection("sample");
String searchString = new String("Arya Stark");
List<Document> pipeline = Arrays.<Document>asList(
new Document("$match",
new Document("show.season.episodes.metaTags",searchString)
),
new Document("$project",
new Document("show",
new Document("$setDifference",
Arrays.<Object>asList(
new Document("$map",
new Document("input","$show")
.append("as","show")
.append("in",
new Document("$let",
new Document("vars",
new Document("season",
new Document("$setDifference",
Arrays.<Object>asList(
new Document("$map",
new Document("input","$$show.season")
.append("as","season")
.append("in",
new Document("$let",
new Document("vars",
new Document("episodes",
new Document("$setDifference",
Arrays.<Object>asList(
new Document("$map",
new Document("input","$$season.episodes")
.append("as","episode")
.append("in",
new Document("$cond",
Arrays.<Object>asList(
new Document("$setIsSubset",
Arrays.<Object>asList(
"$$episode.metaTags",
Arrays.<Object>asList(searchString)
)
),
"$$episode",
false
)
)
)
),
Arrays.<Object>asList(false)
)
)
)
)
.append("in",
new Document("$cond",
Arrays.<Object>asList(
new Document("$ne",
Arrays.<Object>asList(
"$$episodes",
Arrays.<Object>asList()
)
),
new Document("_id","$$season._id")
.append("episodes","$$episodes"),
false
)
)
)
)
)
),
Arrays.<Object>asList(false)
)
)
)
)
.append("in",
new Document("$cond",
Arrays.<Object>asList(
new Document("$ne",
Arrays.<Object>asList(
"$$season",
Arrays.<Object>asList()
)
),
new Document("name","$$show.name")
.append("season","$$season"),
false
)
)
)
)
)
),
Arrays.<Object>asList(false)
)
)
)
)
);
System.out.println(JSON.serialize(pipeline));
AggregateIterable<Document> result = collection.aggregate(pipeline);
MongoCursor<Document> cursor = result.iterator();
while (cursor.hasNext()) {
Document doc = cursor.next();
System.out.println(doc.toJson());
}
As stated earlier, this is a "monster" of syntax and it should give some insight into how difficult it can be to deal with multiple levels of nested arrays in your documents. Anything beyond a singlular array is notoriously difficult to deal with, and essentially impossible to perform atomic updates on due to the restrictions of the positional operator.
So this will work, and you really only need to add in that "metaTags" is embedded within a "questionEntry" object. So replace anything there with "questionEntry.metaTags" instead. But you might however consider changing your schema from this form, in order to make life easier in a lot of coding and maintenance, as well as making things available for atomic updates.
Upvotes: 2
Reputation: 573
You can use following code for aggregation :
mongoClient = new MongoClient("127.0.0.1", 27017);
DB db = mongoClient.getDB("db_name");
DBCollection dbCollection = db.getCollection("collection_name");
//make aggregation pipeline here
List<DBObject> pipeline = new ArrayList<DBObject>();
AggregationOutput output = dbCollection.aggregate(pipeline);
List<DBObject> results = (List<DBObject>) output.results();
//iterate this list and cast DBObject to your POJO
you can cast DBObject
To POJO
or can get value from DBObject
using below method :
dbObject.get("key");
Upvotes: 0