Marcin78
Marcin78

Reputation: 21

embedded elasticsearch - second start up takes long time

I am working on a solution that uses embedded elasticsearch server - on one local machine. The scenario is:

1)create cluster with one node. Import data - 3 million records in ~180 indexes and 911 shards. Data is available, search works and returns expected data, the health seems good:

{
  "cluster_name" : "cn1441023806894",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 911,
  "active_shards" : 911,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

2) Now, I shutdown the server - this is my console output:

sie 31, 2015 2:51:36 PM org.elasticsearch.node.internal.InternalNode stop
INFO: [testbg] stopping ...
sie 31, 2015 2:51:50 PM org.elasticsearch.node.internal.InternalNode stop
INFO: [testbg] stopped
sie 31, 2015 2:51:50 PM org.elasticsearch.node.internal.InternalNode close
INFO: [testbg] closing ...
sie 31, 2015 2:51:50 PM org.elasticsearch.node.internal.InternalNode close
INFO: [testbg] closed

The database folder is around 2.4 GB.

3) Now i start the server again.... and it takes around 10 minutes to reach status green, example health:

{
  "cluster_name" : "cn1441023806894",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 68,
  "active_shards" : 68,
  "relocating_shards" : 0,
  "initializing_shards" : 25,
  "unassigned_shards" : 818
}

After that process, the database folder is ~0.8 GB.

Then I shutdown the database, and open it again, and now it gets green in 10 seconds. All next close/start operations are quite fast.

My configuration:

settings.put(SET_NODE_NAME, projectNameLC);
settings.put(SET_PATH_DATA, projectLocation + "\\" + CommonConstants.ANALYZER_DB_FOLDER); 
settings.put(SET_CLUSTER_NAME, clusterName);
settings.put(SET_NODE_DATA, true);
settings.put(SET_NODE_LOCAL, true);
settings.put(SET_INDEX_REFRESH_INTERVAL, "-1");
settings.put(SET_INDEX_MERGE_ASYNC, true);
//the following settings are my attempt to speed up loading on the 2nd startup
settings.put("cluster.routing.allocation.disk.threshold_enabled", false);
settings.put("index.number_of_replicas", 0);
settings.put("cluster.routing.allocation.disk.include_relocations", false);
settings.put("cluster.routing.allocation.node_initial_primaries_recoveries", 25);
settings.put("cluster.routing.allocation.node_concurrent_recoveries", 8);
settings.put("indices.recovery.concurrent_streams", 6);
settings.put("indices.recovery.concurrent_streams", 6);
settings.put("indices.recovery.concurrent_small_file_streams", 4);

The questions:

1) What happens during the second start up? The db folder size reduces from 2.4gb into 800 megabytes.

2)If this process is necessary, can it be trigerred manually, so I can show nice "please wait" dialog?

The user experience on teh second database opening is very bad and I need to change it.

Cheers Marcin

Upvotes: 1

Views: 360

Answers (1)

Marcin78
Marcin78

Reputation: 21

on another forum - https://discuss.elastic.co/t/initializing-shards-second-db-start-up-takes-long-time/28357 - I got answer from Mike Simos. The solution is to call synced flush on an index after I finished adding data to it:

client.admin().indices().flush(new FlushRequest(idxName));

And it did the trick: now my database starts in 30 seconds not 10 minutes, the time to flush the data is moved to the import part of my business logic, and that is acceptable. I also noticed that the time impact on import is not very big.

Upvotes: 1

Related Questions