Gillespie
Gillespie

Reputation: 2228

NEST Elasticsearch Reindex examples

my objective is to reindex an index with 10 million shards for the purposes of changing field mappings to facilitate significant terms analysis.

My problem is that I am having trouble using the NEST library to perform a re-index, and the documentation is (very) limited. If possible I need an example of the following in use:

http://nest.azurewebsites.net/nest/search/scroll.html

http://nest.azurewebsites.net/nest/core/bulk.html

Upvotes: 9

Views: 6508

Answers (3)

Jonas Code North
Jonas Code North

Reputation: 1

I second Ben Wilde's answer above. Better to have full control over index creation and the re-index process.

What's missing from Ben's code is support for parent/child relationship. Here is my code to fix that:

Replace the following lines:

foreach (var hit in searchResult.Hits)
{
    b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id));
}

With this:

foreach (var hit in searchResult.Hits)
{
    var jo = hit.Source as JObject;
    JToken jt;
    if(jo != null && jo.TryGetValue("parentId", out jt))
    {
        // Document is child-document => add parent reference
        string parentId = (string)jt;
        b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id).Parent(parentId));
    }
    else
    {
        b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id));
    }                                
}

Upvotes: 0

Ben Wilde
Ben Wilde

Reputation: 5672

Unfortunately the NEST implementation is not quite what I expected. In my opinion it's a bit over-engineered for possibly the most common use case.

Alot of people just want to update their mappings with zero downtime...

In my case - I had already taken care of creating the index with all its settings and mappings, but NEST insists that it must create a new index when reindexing. That among many other things. Too many other things.

I found it much less complicated to just implement directly - since NEST already has Search, Scroll, and Bulk methods. (this is adopted from NEST's implementation):

// Assuming you have already created and setup the index yourself
public void Reindex(ElasticClient client, string aliasName, string currentIndexName, string nextIndexName)
{
    Console.WriteLine("Reindexing documents to new index...");
    var searchResult = client.Search<object>(s => s.Index(currentIndexName).AllTypes().From(0).Size(100).Query(q => q.MatchAll()).SearchType(SearchType.Scan).Scroll("2m"));
    if (searchResult.Total <= 0)
    {
        Console.WriteLine("Existing index has no documents, nothing to reindex.");
    }
    else
    {
        var page = 0;
        IBulkResponse bulkResponse = null;
        do
        {
            var result = searchResult;
            searchResult = client.Scroll<object>(s => s.Scroll("2m").ScrollId(result.ScrollId));
            if (searchResult.Documents != null && searchResult.Documents.Any())
            {
                searchResult.ThrowOnError("reindex scroll " + page);
                bulkResponse = client.Bulk(b =>
                {
                    foreach (var hit in searchResult.Hits)
                    {
                        b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id));
                    }

                    return b;
                }).ThrowOnError("reindex page " + page);
                Console.WriteLine("Reindexing progress: " + (page + 1) * 100);
            }

            ++page;
        }
        while (searchResult.IsValid && bulkResponse != null && bulkResponse.IsValid && searchResult.Documents != null && searchResult.Documents.Any());
        Console.WriteLine("Reindexing complete!");
    }

    Console.WriteLine("Updating alias to point to new index...");
    client.Alias(a => a
        .Add(aa => aa.Alias(aliasName).Index(nextIndexName))
        .Remove(aa => aa.Alias(aliasName).Index(currentIndexName)));

    // TODO: Don't forget to delete the old index if you want
}

And the ThrowOnError extension method in case you want it:

public static T ThrowOnError<T>(this T response, string actionDescription = null) where T : IResponse
{
    if (!response.IsValid)
    {
        throw new CustomExceptionOfYourChoice(actionDescription == null ? string.Empty : "Failed to " + actionDescription + ": " + response.ServerError.Error);
    }

    return response;
}

Upvotes: 6

batwad
batwad

Reputation: 3665

NEST provides a nice Reindex method you can use, although the documentation is lacking. I've used it in a very rough-and-ready fashion with this ad-hoc WinForms code.

    private ElasticClient client;
    private double count;

    private void reindex_Completed()
    {
        MessageBox.Show("Done!");
    }

    private void reindex_Next(IReindexResponse<object> obj)
    {
        count += obj.BulkResponse.Items.Count();
        var progress = 100 * count / (double)obj.SearchResponse.Total;
        progressBar1.Value = (int)progress;
    }

    private void reindex_Error(Exception ex)
    {
        MessageBox.Show(ex.ToString());
    }

    private void button1_Click(object sender, EventArgs e)
    {
        count = 0;

        var reindex = client.Reindex<object>(r => r.FromIndex(fromIndex.Text).NewIndexName(toIndex.Text).Scroll("10s"));

        var o = new ReindexObserver<object>(onError: reindex_Error, onNext: reindex_Next, completed: reindex_Completed);
        reindex.Subscribe(o);
    }

And I've just found the blog post that showed me how to do it: http://thomasardal.com/elasticsearch-migrations-with-c-and-nest/

Upvotes: 14

Related Questions