zahory
zahory

Reputation: 1041

Elasticsearch "get by index" returns the document, while "match_all" returns no results

I am trying to mock elasticsearch data for hosted CI unit-testing purposes.

I have prepared some fixtures that I can successfully load with bulk(), but then, for unknown reason, I cannot match anything, even though the test_index seemingly contains the data (because I can get() items by their IDs).

The fixtures.json is a subset of ES documents that I fetched from real production index. With real world index, everything works as expected and all tests pass.

An artificial example of the strange behaviour follows:

class MyTestCase(TestCase):
    es = Elasticsearch()

    @classmethod
    def setUpClass(cls):
        super().setUpClass()
        cls.es.indices.create('test_index', SOME_SCHEMA)

        with open('fixtures.json') as fixtures:
            bulk(cls.es, json.load(fixtures))

    @classmethod
    def tearDownClass(cls):
        super().tearDownClass()
        cls.es.indices.delete('test_index')

    def test_something(self):
        # check all documents are there:
        with open('fixtures.json') as fixtures:
            for f in json.load(fixtures):
                print(self.es.get(index='test_index', id=f['_id']))
                # yes they are!

        # BUT:
        match_all = {"query": {"match_all": {}}}
        print('hits:', self.es.search(index='test_index', body=match_all)['hits']['hits'])
        # prints `hits: []` like there was nothing in

        print('count:', self.es.count(index='test_index', body=match_all)['count'])
        # prints `count: 0`

Upvotes: 0

Views: 241

Answers (2)

Debosmit Ray
Debosmit Ray

Reputation: 5403

While @jsmesami's is very correct in his answer, there is this possibly cleaner way of doing this. If you notice, the issue is because ES has not re-indexed. There are actually functions exposed by the API for this very purpose. Try something like,

cls.es.indices.flush(wait_if_ongoing=True)
cls.es.indices.refresh(index='*')

To be more specific, you can pass index='test_index' to both these functions. I think this is a cleaner and more specific way than using sleep(..).

Upvotes: 1

jsmesami
jsmesami

Reputation: 128

While I can completely understand your pain (everything works except for the tests), the answer is actually quite simple: the tests, in contrast to your experiments, are too quick.

  • Elasticsearch is near real-time search engine, which means there is up to 1s delay between indexing a document and it being searchable.
  • There is also unpredictable delay (depending on actual overhead) between creating an index and it being ready.

So the fix would be time.sleep() to give ES some space to create all the sorcery it needs to give you results. I would do this:

@classmethod
def setUpClass(cls):
    super().setUpClass()
    cls.es.indices.create('test_index', SOME_SCHEMA)

    with open('fixtures.json') as fixtures:
        bulk(cls.es, json.load(fixtures))

    cls.wait_until_index_ready()

@classmethod
def wait_until_index_ready(cls, timeout=10):
    for sec in range(timeout):
        time.sleep(1)
        if cls.es.cluster.health().get('status') in ('green', 'yellow'):
            break

Upvotes: 1

Related Questions