Reputation: 317
I am new to elasticsearch. And I have two documents which are JarFileData and ClassData. I have linked those two documents with jarFileId field.
This is the ClassData
{
"_id" : ObjectId("59881021e950041f0c6fa1fa"),
"ClassName" : "Exception",
"jarFileId" : "JAR-0001",
"dependencies" : [
{
"dependedntClass" : "java/lang/RuntimeException",
"methodSignature" : "<init>"
},
{
{
"dependedntClass" : "java/awt/EventQueue",
"methodSignature" : "isDispatchThread"
},
{
"dependedntClass" : "Exception",
"methodSignature" : "setStackTrace"
}
]
}
This is JarFileData
{
"_id" : ObjectId("59881021e950041f0c6fa1f7"),
"jarFileName" : "Client.jar",
"jarFileId" : "JAR-0001",
"directory" : "C:\\Projects\\Test\\Application",
"version" : null,
"artifactID" : null,
"groupID" : null
}
I want to give a directory and get all jarFiles in that directory and use it to find the dependent classes in ClassData type for those jarFiles.
This is the function I used in node.js for retrieving jarFileData type for a given directory.
const test = function test() {
let body = {
size: 20,
from: 0,
{
query: {
match: {
directory: 'C:\\Projects\\Test\\Application'
}
}
}
};
}
I am trying to use the resultset from the above query to query classData type. I am stuck in this part for a long time and don't know how to do it in elastic-search. Any help would be much appreciated.
Upvotes: 2
Views: 150
Reputation: 2208
Before you can go further, there are two steps that needs to be done:
jarFileId
and dependedntClass
fields should be mapped as a keyword
type (if this is a problem you can use multi-field field of keyword
type, and use them in query)dependencies
should be nested objectLooking at your data, the joining element between these two types of documents is jarFileId
field. If your existing query gave you in result e.g. this list of jars:
{[{"jarFileId": "JAR-0001"},{"jarFileId": "JAR-0002"}]}
having this information, you can use this query:
{
"size":0,
"query":{
"constant_score":{
"filter":{
"terms":{ "jarFileId":["JAR-0001","JAR-0002"] }
}
}
},
"aggs":{
"filtered":{
"filter":{
"constant_score":{
"filter":{
"terms":{ "jarFileId":["JAR-0001","JAR-0002"] }
}
}
},
"aggs":{
"dependent":{
"nested":{
"path":"dependencies"
},
"aggs":{
"classes":{
"terms":{
"field":"dependencies.dependedntClass"
}
}
}
}
}
}
}
}
And as a result you'll get:
{
...,
"aggregations": {
"filtered": {
"doc_count": 1,
"dependent": {
"doc_count": 3,
"classes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "core/internal/TrackingEventQueue$TrackingException",
"doc_count": 1
},
{
"key": "java/awt/EventQueue",
"doc_count": 1
},
{
"key": "java/lang/RuntimeException",
"doc_count": 1
}
]
}
}
}
}
}
With your current model, it is not possible to do it with one query - elsticsearch does not have a join mechanism. A single document should have all the necessary information so that elasticsearch is able to decide if it matches the query or not. This is nicely described here. So either you go with application-side joins (similar example to yours under the link) or denormalize your data if the performance of search is the core issue here. The only built-in "join mechanism" that I'm aware of is Term Filter Lookup but it allows to operate only on id
field.
Upvotes: 1