z3r0
z3r0

Reputation: 93

Querying across multiple elasticsearch types

I want to fetch documents present in multiple types (type1 AND type2 AND type3...) in Elastic Search 5.0 . I know searching across multiple types is possible by using multiple types like type1,type2 in URL and by also filtering the _type field. But all these conditions are OR (type1 OR type2). How do I achieve the AND condition?

Here are two documents in my ES,

{
   "_index":"cust_58e8700034fa4e368590fb1396e2641c",
   "_type":"unique-fp-domains",
   "_id":"n_d4dbba7309a94503b25eca735078f17c_258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
   "_version":2,
   "_score":1,
   "_source":{
      "mg_timestamp":1579866709096,
      "violated-directive":"connect-src",
      "fp-hash":"258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
      "time":1579866709096,
      "scan-id":"n_d4dbba7309a94503b25eca735078f17c",
      "blocked-uri":"play.sundaysky.com"
   }
}


{
   "_index":"cust_58e8700034fa4e368590fb1396e2641c",
   "_type":"tag-alexa-top1k-using-csp-tld-domain",
   "_id":"AW_XY4P4kmprPQ28bTUb",
   "_version":1,
   "_score":1,
   "_source":{
      "tagged-domain":"sundaysky.com",
      "tag-guidance":"FP",
      "additional-tag-metadata-isbase64-encoded":"eyJ0b3RhbC1hbGV4YS1tYXRjaGVzIjoyMzh9",
      "project-id":2,
      "fp-hash":"258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
      "scan-id":"n_d4dbba7309a94503b25eca735078f17c",
   }
}

I want to fetch the documents from the same index from the given 2 types with "scan-id":"n_d4dbba7309a94503b25eca735078f17c"

I tried this,

{
  "size": 100,
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "filter": [
              {
                "term": {
                  "_type": {
                    "value": "tag-alexa-top1k-using-csp-tld-domain"
                  }
                }
              },
              {
                "term": {
                  "scan-id": {
                    "value": "n_d4dbba7309a94503b25eca735078f17c"
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "filter": [
              {
                "term": {
                  "_type": {
                    "value": "unique-fp-domains"
                  }
                }
              },
              {
                "term": {
                  "scan-id": {
                    "value": "n_d4dbba7309a94503b25eca735078f17c"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

But it doesn't work.

Upvotes: 3

Views: 439

Answers (4)

Nikolay Vasiliev
Nikolay Vasiliev

Reputation: 6066

Elasticsearch is not good in joining different collections of documents, but in your case you might be able to solve your issue with parent-child relationship.

How to query many index types together in an AND fashion?

In case when you have a one-to-many relationship you can model it with parent-child. Let's suppose that type unique-fp-domains is "parent" type and scan-id field is a unique identifier. Let's also suppose that tag-alexa-top1k-using-csp-tld-domain is a "child" and every document of type tag-alexa-top1k-using-csp-tld-domain refers to exactly 1 document in unique-fp-domains.

Then we should create the Elasticsearch mapping in the following way:

PUT /cust_58
{
  "mappings": {
    "unique-fp-domains": {},
    "tag-alexa-top1k-using-csp-tld-domain": {
      "_parent": {
        "type": "unique-fp-domains" 
      }
    }
  }
}

And insert the documents like this:

# "parent"
PUT /cust_58/unique-fp-domains/n_d4dbba7309a94503b25eca735078f17c
{
    "mg_timestamp": 1579866709096,
    "violated-directive": "connect-src",
    "fp-hash": "258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
    "time": 1579866709096,
    "scan-id": "n_d4dbba7309a94503b25eca735078f17c",
    "blocked-uri": "play.sundaysky.com"
}

# "child"
POST /cust_58/tag-alexa-top1k-using-csp-tld-domain?parent=n_d4dbba7309a94503b25eca735078f17c
{
    "tagged-domain": "sundaysky.com",
    "tag-guidance": "FP",
    "additional-tag-metadata-isbase64-encoded": "eyJ0b3RhbC1hbGV4YS1tYXRjaGVzIjoyMzh9",
    "project-id": 2,
    "fp-hash": "258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
    "scan-id": "n_d4dbba7309a94503b25eca735078f17c"
}

Now we will be able to query for parent objects having any child associated with it == join on parent ID, which is we forced to be scan-id by providing the _id of the document manually.

The query will use has_child and will look like this:

POST /cust_58/unique-fp-domains/_search
{
    "query": {
        "has_child": {
            "type": "tag-alexa-top1k-using-csp-tld-domain",
            "query": {
                "match_all": {}
            },
            "inner_hits": {}
        }
    }
}

Note that we use inner_hits to tell Elasticsearch to retrieve the matched "child" documents.

The output would look like:

    "hits": [
      {
        "_index": "cust_58",
        "_type": "unique-fp-domains",
        "_id": "n_d4dbba7309a94503b25eca735078f17c",
        "_score": 1.0,
        "_source": {
          "mg_timestamp": 1579866709096,
          "violated-directive": "connect-src",
...
        },
        "inner_hits": {
          "tag-alexa-top1k-using-csp-tld-domain": {
            "hits": {
              "total": 1,
              "max_score": 1.0,
              "hits": [
                {
                  "_type": "tag-alexa-top1k-using-csp-tld-domain",
                  "_id": "AW_xhfnnIzWDkoWd1czA",
                  "_score": 1.0,
                  "_routing": "n_d4dbba7309a94503b25eca735078f17c",
                  "_parent": "n_d4dbba7309a94503b25eca735078f17c",
                  "_source": {
                    "tagged-domain": "sundaysky.com",
...
                  }

What are the downsides of using parent-child?

  • the parent ID should be unique
  • join is only on parent ID
  • some performance overhead:

    If you care about query performance you should not use this query.

  • to enable parent-child one will have to change the mappings and reindex the existing data

Other important things to consider

In Elasticsearch 6, types have been removed. The good news are that already starting from Elasticsearch 5 one can use join datatype.

In general, Elasticsearch is not very good to manage relations between objects, but there are few ways to deal with them.

Hope that helps!

Upvotes: 1

Abinash
Abinash

Reputation: 103

"query": { "query_string" : { "query" : "(_type : unique-fp-domains OR tag-alexa-top1k-using-csp-tld-domain) AND (scan-id : n_d4dbba7309a94503b25eca735078f17c)
} }

Upvotes: 0

Mesut Aslan
Mesut Aslan

Reputation: 31

I think this query will figure out your problem;

"query": {
  "bool": {
    "must": [
      {
        "terms": {
          "_type": "tag-alexa-top1k-using-csp-tld-domain"
        }
      },
      {
        "terms": {
          "_type": "unique-fp-domains"
        }
      }
    ],
    "filter": [
      {
        "scan-id": {
          "_type": "n_d4dbba7309a94503b25eca735078f17c"
        }
      }
    ]
  }
}

Upvotes: 1

user12797531
user12797531

Reputation: 1

you could use a msearch. This can combine multiple searches. You can find more information about this at their documentation. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html

Upvotes: -1

Related Questions