Nyxter
Nyxter

Reputation: 405

Amazon Cloudsearch : Filter if exists

I have an amazon cloudsearch domain. The aim is to filter if the field 'language' exists. Not all objects have a language, and I want to have the ones which do have a language filtered, but the ones that do not have any language to also be returned.

I want to filter with ( or language:'en' language:null )

However, null cannot be passed within a string.

Is this possible? If so how would it be done.

Upvotes: 10

Views: 3913

Answers (4)

justin.m.chase
justin.m.chase

Reputation: 13675

You can search for existence by using the prefix or range operators depending on your field type. If the type is a term or a string then you can use prefix like so:

(prefix field=example '')

This will yield only results that are not null for the field example.

For dates you can use an inclusive date range:

(range field=updated ['0000-01-01T00:00:00.000Z',})

This will only include items with an updated date after the given time, items with a null updated date will not be included. You can do other similar searches for other field types.

Similarly you can use the not operator to get the set of items with null fields.

For example, All items with a null example field:

(not (prefix field=example ''))

Upvotes: 1

maxa
maxa

Reputation: 86

If you are willing to use the Lucene query parser you can express your query like this:

(*:* OR -language:*) OR language:en

Note: The funky (*:* OR ...) construct is necessary because of the way Lucene treats negated OR clauses.

In general, you can filter by existence / non-existence of a field with the Lucene query parser:

All documents containing field: field:[* TO *]

All documents not containing field: -field:[* TO *]

Note: If field is textual (text or literal datatypes) you don't need range queries and you can shorten the above to:

field:* and -field:*

Upvotes: 6

Nyxter
Nyxter

Reputation: 405

I looked elsewhere aswell, it seems :

The simplest way to do that, is to set a default value for the field, and then use that value for your null.

For example, set the default to the string "null", then you can easily test for that.

I believe you can add a default value, and re-index, and that should reapply the default.

Upvotes: 4

alexroussos
alexroussos

Reputation: 2681

There is no way to cleanly do exactly what you want, but here are two options:

  1. Index a new field called something like has_language, setting its value to language!=null at doc submission time.
  2. This is more of a hack because range should only be used with integers, but I have used it successfully on literal fields (range field=language [0,}).

Upvotes: 2

Related Questions