user8420733
user8420733

Reputation:

Behavior on clustered environment

In clustered environment, I guess basically load balancer will pass request to one of E-nodes. Now, how does each E-node understands which D-nodes to access when particular query is executed? I am bit confused with how the index and cache works under clustered environment.

Upvotes: 1

Views: 43

Answers (1)

grtjn
grtjn

Reputation: 20414

Let me explain the distinction between E- and D-nodes first.

Any host that participates in a MarkLogic cluster can potentially operate as E or D, or even both.

Whether a host operates as E-node is determined by the fact whether it is in a group with app-servers that are relevant to you, like one that exposes some REST api that you need. So, not just Admin or App-Services, but usually something more specific.

Whether a host operates as D-node is determined by the fact whether it holds any forests of a database that is relevant to you, like one that holds part or all data used by a relevant app-server. So not just Modules or Documents, but usually something more specific.

All hosts in a cluster have a complete copy of the cluster config. MarkLogic will take care of getting data when one host needs data located in a forest on a different host.

So, D-nodes are related to data-storage, and that includes indexes, both on disk and in memory.

E-nodes are used to 'evaluate' incoming requests, hence the 'E'. Some caching happens on D-nodes, but expanded tree caches and such typically reside on E-node, so that they don't need to access other hosts to fetch data.

You normally don't need to worry too much about all this, until you reach a stage where you need to tweak performance, which can be very case specific. It can be useful to ask MarkLogic to help with that, if you are in a position to do so.

Now, with regard to load balancing, that only concerns incoming requests, so is relevant to E-nodes. If all hosts are in one Group (not uncommon), every host can act as E-node. The load balancer will need to know the network IPs or names of those machines to relay traffic. In a virtualized environment you probably want to take it even a step further, and allow automatic scaling up and down. The MarkLogic Query Service is also relevant to this.

HTH!

Upvotes: 2

Related Questions