Hugo Rodger-Brown
Hugo Rodger-Brown

Reputation: 11582

Rebuilding ElasticSearch index using django-haystack rebuild_index command

I am trying to get ElasticSearch / Haystack set up on my local dev environment (vagrant VM running Ubuntu 12.04), and I can't work out the re-indexing process.

ES is running, and I have created a new index (I am using elasticsearch-head to view index status in the browser). I can create a new index, and query it, so I know that ES is working.

My problem is with the Haystack rebuild_index command:

(.venv)vagrant@precise32:/app$ foreman run ./manage.py rebuild_index

WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'.
Your choices after this are to restore from backups or rebuild via the `rebuild_index` command.
Are you sure you wish to continue? [y/N] y

Removing all documents from your index because you said so.
DEBUG Making a request equivalent to this: curl -XDELETE 'http://127.0.0.1:9200/test_app' -d '""'
INFO Starting new HTTP connection (1): 127.0.0.1
DEBUG "DELETE /test_app HTTP/1.1" 200 31
DEBUG response status: 200
DEBUG got response {u'acknowledged': True, u'ok': True}
DEBUG Making a request equivalent to this: curl -XPOST 'http://127.0.0.1:9200/test_app/_refresh' -d '""'
DEBUG "POST /test_app/_refresh HTTP/1.1" 404 66
DEBUG response status: 404
Failed to clear Elasticsearch index: (404, u'IndexMissingException[[test_app] missing]')
ERROR Failed to clear Elasticsearch index: (404, u'IndexMissingException[[test_app] missing]')
All documents removed.

Looking at this loggging - it seems as if Haystack is attempting to refresh an index that it has just deleted - which would always fail.

What am I doing wrong?

[UPDATE 1]

If I split the POST requests I can get this to run:

(.venv)vagrant@precise32:/app$ curl -XPOST 'http://127.0.0.1:9200/test_app/'
{"ok":true,"acknowledged":true}

(.venv)vagrant@precise32:/app$ curl -XPOST 'http://127.0.0.1:9200/test_app/_refresh' -d    '""'
{"ok":true,"_shards":{"total":10,"successful":5,"failed":0}}

[UPDATE 2]

Digging in to the code, the ES backend method that is called when running clear_index is:

    def clear(self, models=[], commit=True):
        [...]
        if not models:
            self.conn.delete_index(self.index_name)
        else:
            [...]
        if commit:
            self.conn.refresh(index=self.index_name)

Which looks wrong as it will call conn.refresh on the index that it has just deleted?

[UPDATE 3]

I think the above errors may be a red herring, as the management commands will ignore the errors and continue, giving this error, which I think is more serious:

(.venv)vagrant@precise32:/app$ foreman run ./manage.py update_index --verbosity=3
Skipping '<class 'django.contrib.auth.models.Permission'>' - no index.
Skipping '<class 'django.contrib.auth.models.Group'>' - no index.
Skipping '<class 'django.contrib.auth.models.User'>' - no index.
Skipping '<class 'django.contrib.contenttypes.models.ContentType'>' - no index.
Skipping '<class 'django.contrib.sessions.models.Session'>' - no index.
Skipping '<class 'django.contrib.sites.models.Site'>' - no index.
Skipping '<class 'django.contrib.admin.models.LogEntry'>' - no index.
Skipping '<class 'django.contrib.flatpages.models.FlatPage'>' - no index.
ERROR Error updating test_app using default
Traceback (most recent call last):
  File "/home/vagrant/.venv/src/django-haystack/haystack/management/commands/update_index.py", line 210, in handle_label
    self.update_backend(label, using)
  File "/home/vagrant/.venv/src/django-haystack/haystack/management/commands/update_index.py", line 239, in update_backend
    end_date=self.end_date)
  File "/home/vagrant/.venv/src/django-haystack/haystack/indexes.py", line 157, in build_queryset
    index_qs = self.index_queryset(using=using)
TypeError: index_queryset() got an unexpected keyword argument 'using'
Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/home/vagrant/.venv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 443, in execute_from_command_line
    utility.execute()
  File "/home/vagrant/.venv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 382, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/vagrant/.venv/local/lib/python2.7/site-packages/django/core/management/base.py", line 196, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/home/vagrant/.venv/local/lib/python2.7/site-packages/django/core/management/base.py", line 232, in execute
    output = self.handle(*args, **options)
  File "/home/vagrant/.venv/src/django-haystack/haystack/management/commands/update_index.py", line 184, in handle
    return super(Command, self).handle(*items, **options)
  File "/home/vagrant/.venv/local/lib/python2.7/site-packages/django/core/management/base.py", line 341, in handle
    label_output = self.handle_label(label, **options)
  File "/home/vagrant/.venv/src/django-haystack/haystack/management/commands/update_index.py", line 210, in handle_label
    self.update_backend(label, using)
  File "/home/vagrant/.venv/src/django-haystack/haystack/management/commands/update_index.py", line 239, in update_backend
    end_date=self.end_date)
  File "/home/vagrant/.venv/src/django-haystack/haystack/indexes.py", line 157, in build_queryset
    index_qs = self.index_queryset(using=using)
TypeError: index_queryset() got an unexpected keyword argument 'using'

[UPDATE 4]

OK - so it's my fault, I was using an old search_indexes.py file, and my index_queryset() method was incorrect. I won't close this as it may be useful for others.

Upvotes: 4

Views: 4080

Answers (1)

Hugo Rodger-Brown
Hugo Rodger-Brown

Reputation: 11582

Answering this one myself - albeit it's just an admission of my own stupidity in this one.

I carried a search_indexes.py file from the 1.x version of Haystack into a new branch of our project that was using the 2.x version of Haystack, which is configured slightly differently. In the new version, the index_queryset() method now requires a new using parameter (defaults to None). The older version didn't require this.

The new signature should therefore be:

def index_queryset(self, using=None):
    pass

Upvotes: 5

Related Questions