All queries with 'ON CLUSTER' clause timed out with error message 'There is no a local address in host list'

Question

We're building up a ClickHouse cluster (version 20.1.8.41) on 7 nodes, using a "circular replica" pattern (i.e. 7 shards * 2 replicas on different nodes), with an extra ZooKeeper cluster.

The /etc/hosts files are all correctly configured, and the cluster started succcessfully.

However, when we're executing distributed DDL queries, they all hanged and eventually timed out, e.g.:

:) create database ods on cluster sht_ck_cluster_1;

CREATE DATABASE ods ON CLUSTER sht_ck_cluster_1

→ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) Received exception from server (version 20.1.8):
Code: 159. DB::Exception: Received from localhost:9002. DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000000007 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 14 unfinished hosts (0 of them are currently active), they are going to execute the query in background. 

0 rows in set. Elapsed: 180.589 sec.

The clickhouse-server.log on the client node gives information below:

2020.04.23 00:33:33.327414 [ 32 ] {c3c49bd3-333d-4fca-aa2f-2520f5c0cb9f}  executeQuery: Code: 159, e.displayText() = DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000000007 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 14 unfinished hosts (0 of them are currently active), they are going to execute the query in background (version 20.1.8.41) (from 127.0.0.1:42198) (in query: CREATE DATABASE ods ON CLUSTER sht_ck_cluster_1), Stack trace (when copying this message, always include the lines below):

0. 0xb2087bc Poco::Exception::Exception(std::__1::basic_string, std::__1::allocator > const&, int)  in /usr/bin/clickhouse
1. 0x4d8e3c9 DB::Exception::Exception(std::__1::basic_string, std::__1::allocator > const&, int)  in /usr/bin/clickhouse
2. 0x84846b9 DB::DDLQueryStatusInputStream::readImpl()  in /usr/bin/clickhouse
3. 0x8345e3f DB::IBlockInputStream::read()  in /usr/bin/clickhouse
4. 0x833d541 DB::AsynchronousBlockInputStream::calculate()  in /usr/bin/clickhouse
5. 0x833e113 ?  in /usr/bin/clickhouse
6. 0x4dc8b7a ThreadPoolImpl::worker(std::__1::__list_iterator)  in /usr/bin/clickhouse
7. 0x4dc9790 ThreadFromGlobalPool::ThreadFromGlobalPool::scheduleImpl(std::__1::function, int, std::__1::optional)::'lambda1'()>(void&&, void ThreadPoolImpl::scheduleImpl(std::__1::function, int, std::__1::optional)::'lambda1'()&&...)::'lambda'()::operator()() const  in /usr/bin/clickhouse
8. 0x4dc7eca ThreadPoolImpl::worker(std::__1::__list_iterator)  in /usr/bin/clickhouse
9. 0x4dc69dc ?  in /usr/bin/clickhouse
10. 0x7e25 start_thread  in /usr/lib64/libpthread-2.17.so
11. 0xfebad clone  in /usr/lib64/libc-2.17.so

What seems weird is that the clickhouse-server.log on the other nodes says:

2020.04.23 00:30:32.744984 [ 23 ] {}  DDLWorker: Processing tasks
2020.04.23 00:30:32.744992 [ 24 ] {}  DDLWorker: Cleaning queue
2020.04.23 00:30:32.746629 [ 23 ] {}  DDLWorker: Will not execute task query-0000000007: There is no a local address in host list
2020.04.23 00:30:32.746641 [ 23 ] {}  DDLWorker: Waiting a watch

I'm completely confused about this message. I've tried restarting the cluster, disabling DNS cache, and setting the parameter explicitly, but nothing worked.

What else should I do? Many thanks.

Regards

All queries with 'ON CLUSTER' clause timed out with error message 'There is no a local address in host list'

Answers (1)

Related Questions

All queries with &#39;ON CLUSTER&#39; clause timed out with error message &#39;There is no a local address in host list&#39;

Answers (1)

Related Questions

All queries with 'ON CLUSTER' clause timed out with error message 'There is no a local address in host list'