oblivion
oblivion

Reputation: 6548

Terminologies of distributed system: node,shard, cluster

I am having a hard time understanding different terminologies of distributed computing:

1). What is a node ? Is it simply the number of machines within a distributed system or is it the number of processes ran by a single machine ?

2). What is the relation between a shard and a node within a cluster ?

3). I understand that sharding is a separation of data inside a table/collection across multiple shards using shard keys. Is sharding a physical separation OR a logical separation ?

Upvotes: 2

Views: 239

Answers (3)

oblivion
oblivion

Reputation: 6548

I found all my answers and cleared confusions from here: Elastic Search 5.x: Basic Concepts

Note: this reference guide is for 5.x version. I was looking at the 2.x version before which doesn't not have a clear explanation on these issues.The links provided by @Artholl in his answer also belongs to 2.x

Upvotes: 0

Artholl
Artholl

Reputation: 93

Considering the elasticsearch tag in your question, Here is the elasticsearch nomemclature:

According to https://www.elastic.co/guide/en/elasticsearch/guide/current/_an_empty_cluster.html

Elasticsearch Node:

A node is a running instance of Elasticsearch

Elasticsearch Cluster

A cluster consists of one or more nodes with the same cluster.name that are working together to share their data and workload.

According to https://www.elastic.co/guide/en/elasticsearch/guide/current/_add_an_index.html

Elasticsearch Shard

A shard is a low-level worker unit that holds just a slice of all the data in the index.

A shard is a single instance of Lucene, and is a complete search engine in its own right

Okay, now we have seen the concept of Cluster, Node and Shard in Elasticsearch. We can see that those definitions are pretty different (because specific to ES) to the one given by xosp7tom.

One piece of advice would be to read the elasticsearch chapter: https://www.elastic.co/guide/en/elasticsearch/guide/current/distributed-cluster.html if you want to have more information on how Elasticsearch team built their distributed search engine. It is pretty interesting and a good introduction to distributed system!

Upvotes: 1

xosp7tom
xosp7tom

Reputation: 2183

to 1)

a node refers one machine of a cluster. a socket refers one processor of a machine. a core refers one processing unit of a socket. a cpu is typically same as core.

For example, Tianhe-2 - as one cluster - has 130,000 nodes, 260,000 sockets, and 3,120,000 cores. https://www.top500.org/system/177999

Upvotes: 1

Related Questions