manymanymore
manymanymore

Reputation: 3128

Whas is the difference between the `has` and `contains` operators in KQL?

Whas is the difference between the has and contains operators in KQL?

Here is the has operator documentation. Here is the documentation for the contains operator.

Both of them check for an existence of a case insensitive string. So, does it mean that the usage of one operator over the other is just a matter of taste?

Upvotes: 13

Views: 22232

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44981

  • contains will always return true if the searched string exists within the searched text.
  • has result depends on the surrounding of the searched string within the searched text.

Why should we prefer has over contains in some scenarios?
TL;DR: performance (use of index vs. data scan).


A term is a sequence of alpha-numeric characters (see What is a term?).

Some examples:

  • 1Hello2World, Z & 42 are all terms.
  • Hello-World, Hello_World & Hello World, are all constructed of 2 terms.
  • 056c8e97-422f-4e29-836b-ec3b0d77a50b (an example for GUID) is constructed of 5 terms.
  • !@#$%^&*() has no terms

Azure Data Explorer (AKA ADX, AKA Kusto), indexes every term, as long it is 3 characters long or more (for storage engine v3. For v2 it is 4 characters).
The index (Full-text search index) is what enables ADX to return search results in sub-seconds/seconds even when the searched is done on Petabytes.
As of today, the index can be used for a whole term search, or for a prefix search.

Here are how contains & has behave for different searched strings:

  • ell
    • contains finds the searched string within texts such as ell, Hell, Ella, HELLO, 7ell8 & (.ell.), yielding a data scan (not using the index).
    • has finds the searched string within texts such as ell, Ell, ELL, & (.ell.), leveraging the index.
      has does not find the searched string if it is contained within a longer term (e.g., bell, Ella or Hello)
  • el
    • Similar to the example above, however both operators do not use the index since the index contains only terms of 3 characters or more.
  • #ell#
    • Both operators return the same results.
    • Both operators leverage the index (whole term search) for an initial filtering on the term (ell), followed by a narrowed data scan (to filter out on the entire searched string).
  • #ell
    • contains finds the searched string within texts such as #ell, @#Ell#@, #Ella etc.
    • has finds the searched string within texts such as #ell & @#Ell#@.
      has does not find the searched string if the term (ell) is a prefix of a longer term (e.g., Ella).
    • Both operators leverage the index (has performs whole term search while contains perform prefix search) for an initial filtering on the term (ell), followed by a narrowed data scan (to filter out on the entire searched string).
    • In this context hasprefix will behave exactly like *contains.
  • ell#
    • contains finds the searched string within texts such as ell#, @#Ell#@, Hell# etc.
    • has finds the searched string within texts such as ell# & @#Ell#@.
      has does not find the searched string if the term (ell) is a suffix of a longer term (e.g., Cell).
      has leverage the index while contains does not.
    • In this context hassuffix will behave exactly like *contains.

P.S. Even if a term is indexed, the index might not be used, e.g., when a term is highly common, scanning the data itself directly might be cheaper than using the index.

Here are some examples for search strings that are found by contains and not by has.
Note the following:

  • contains always finds the searched substring (hell or hello).
  • has never finds the substring hell.
  • has finds the searched substring helloas long as it is not a part of a longer alpha-numeric sequence.
datatable(txt:string)
[
    "Hello World"
   ,"<Hello-World>"
   ,"*Hello*World*"
   ,"?Hello%World!"
   ,"_Hello_World_"
   ,"123Hello-World456"
   ,"abcHello Worldxyz"
   ,"HelloWorld"
]
| extend contains_hell  = txt contains  "hell"
        ,contains_hello = txt contains  "hello"
        ,has_hell       = txt has       "hell"
        ,has_hello      = txt has       "hello"
txt contains_hell contains_hello has_hell has_hello
Hello World true true false true
<Hello-World> true true false true
*Hello*World* true true false true
?Hello%World! true true false true
Hello_World true true false true
123Hello-World456 true true false false
abcHello Worldxyz true true false false
HelloWorld true true false false

Fiddle

Upvotes: 25

Related Questions