user2719100
user2719100

Reputation: 1744

How do I set up ElasticSearch nodes on EC2 so they have consistent DNS entries?

I have 3 ElasticSearch nodes in a cluster on AWS EC2. My client apps use connection pooling and have the public IP addresses for all 3 nodes in their config files.

The problem I have is that EC2 seems to occasionally reassign public IP addresses for these instances. They also change if I stop and restart an instance.

My app will actually stay online since the connection pool will round robin the three known IP addresses, but eventually, all three will change and the app will stop working.

So, how should I be setting up an ElasticSearch cluster on EC2 so that my clients can continue to connect even if the instances change IP addresses?

  1. I could use Elastic IPs, but these are limited to 5 per account and I will eventually have many more than 5 nodes (different environments, dev, staging, test, etc.)
  2. I could use Elastic Load Balancers, and put one node behind each ELB, but that seems like a pretty hacky solution and an improper use of load balancers.
  3. I could create my own DNS entries under my own domain and update the DNS table whenever I notice an IP address has changed, but that seems really error prone if no one is checking the IPs every day.

Is there an option I'm missing?

Upvotes: 2

Views: 3675

Answers (2)

John Petrone
John Petrone

Reputation: 27515

Use one or two query only nodes - referred to in the documentation as "non data" nodes.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html

In front of the cluster we can start one or more "non data" nodes which will start with HTTP enabled. All HTTP communication will be performed through these "non data" nodes.

The benefit of using that is first the ability to create smart load balancers. These "non data" nodes are still part of the cluster, and they redirect operations exactly to the node that holds the relevant data. The other benefit is the fact that for scatter / gather based operations (such as search), these nodes will take part of the processing since they will start the scatter process, and perform the actual gather processing.

These nodes don't need much disk (they are query and index processing only). You route all your requests thru them. You can add more and more data nodes as you ingest more data without changing these "non data" nodes. You run a couple of them (to be safe) and use either DNS or Elastic IP addresses. You need far fewer of the IP addresses as these are not data nodes and you tend not to need to change these as frequently as you do data nodes.

This configuration approach is documented in the elasticsearch.yml file, quoted below:

You want this node to be neither master nor data node, but
to act as a "search load balancer" (fetching data from nodes,
aggregating results, etc.)

node.master: false

node.data: false

Upvotes: 1

GlenRSmith
GlenRSmith

Reputation: 786

I haven't seen the changing IP address on a running instance that you're describing, but using this approach, it shouldn't matter:

Use DNS names for everything, not IP addresses.

Lets say you want to hit your cluster via http://elastic.rabblerabble.com:9200.

Create the EC2 instances for your nodes. Name them elastic-0, elastic-1, and elastic-2.

In EC2 Load Balancers, create an ELB named 'es-elb' that includes each of these instances by name, with port forwarding of port 9200.

In Route 53, create unique CNAMEs for each of your instances, with the Public DNS as the value, and a CNAME for your ELB:

Name                                Type     Value
elastic-0.rabblerabble.com.         CNAME    Public DNS of instance elastic-0
elastic-1.rabblerabble.com.         CNAME    Public DNS of instance elastic-1
elastic-2.rabblerabble.com.         CNAME    Public DNS of instance elastic-2
elastic.rabblerabble.com.           CNAME    Public DNS of ELB es-elb

There's more needed for security, health checks, etc. but that's outside the scope of the question.

Upvotes: 2

Related Questions