Scaling Write Throughput for 1e13 Small Events with Citus

Question

In a past job I worked with financial tick data, a time series of events where each event/record is about 100 bytes and has 15-20 columns. A full time series has order 1e12 events quickly growing to 1e13. The mostly uncompressed data required about 300 TB with replication and was distributed over 10 hefty bare metal nodes.

I recently learned of Citus, and as of version 11 it supports load-balanced queries across all workers. It's open source, straightforward to deploy, and offers very high throughput for both online and analytical queries. It looks great for many low-latency queries, much better than the solution we ended up with.

Except one big issue: loading all the data. Citus' Query Performance Tuning page says writes must go through the single coordinator node and "up to" 2M records/second is possible for bulk loading. For 1e13 records that would take about 60 days which is too long. Even if a more powerful server allows for 10x the write throughput it would still take a full week to load - not completely game breaking, but reducing this to one day or overnight would change the game entirely.

But another link for Cluster Management has a tantalizing suggestion:

However, in some write heavy use cases where the coordinator becomes a performance bottleneck, users can add another coordinator. As the metadata tables are small (typically a few MBs in size), it is possible to copy over the metadata onto another node and sync it regularly. Once this is done, users can send their queries to any coordinator and scale out performance. If your setup requires you to use multiple coordinators, please contact us.

In a few hours of searching I couldn't find anything more about this. Are there any guides or suggestions about how to add and use multiple coordinators to increase write throughput? In this case a single table with row-based sharding would be fine so the metadata should be mostly static.

My current best guess is 1. create a "leader coordinator" for lack of a better term, 2. stream its WAL to several "follower coordinators," then 3. any ordinary insert, update, and delete statements can be load-balanced across all coordinators. But any operation that changes the metadata, such as adding or removing columns or workers, would have to go through the "leader coordinator."

** note: that link for "contact us" includes Stack Overflow... so consider yourself contacted!

Scaling Write Throughput for 1e13 Small Events with Citus

Answers (0)

Related Questions