Rethinkdb 2.2 changefeeds with include_initial

Question

For the simplest example possible, let's say I'm pushing a list of my favorite foods to everyone who subscribes.

r.table('food').changes().run(conn, (err, cursor) => {
  cursor.each((err, change) => {
    io.emit('NEW_FAVORITE', change);
  })
})

Now let's say I have 500 people actively watching me add my favorite foods. What would be more performant, 500 people subscribed to 500 changefeeds that each have include_initial, or 500 initial queries pushed to those individuals & then 500 people watching 1 changefeed? Bonus points for explaining why!

mlucy · Accepted Answer

You can't have multiple clients reading from one changefeed, so the only way to get 500 people watching one changefeed is to have a single client reading from that changefeed and then pushing to 500 people.

RethinkDB deduplicates changefeed messages inside the cluster if multiple clients are subscribed to the same table, so this isn't really any different than having 500 open changefeeds in terms of network traffic. The server will use a little more memory because it's tracking which changefeeds have which messages, but if you have one client reading from a changefeed and pushing to 500 people it would have to track that too.

The real reason to use include_initial though is that it prevents races. If you do a read and then open a changefeed, it's possible for a change to occur between the end of the read and when the changefeed starts. include_initial prevents that by atomically switching over from reading to passing on changes.

(One complication is the case where you have 500 processes on machine A that want to read from a RethinkDB server on machine B. In that case there's a difference in network traffic between the two solutions because if you put one client on machine A reading from a changefeed and pushing to the processes, each change gets sent from B to A once and then transferred to the processes locally, while in the other case each change is sent 500 times over the network. If the network connection between A and B is slow compared to the time to transfer between processes on machine A, then that matters a lot. The best way to resolve that is to add a proxy node on machine A and open 500 changefeeds on that node, since RethinkDB will deduplicate the messages to the proxy node.)

Rethinkdb 2.2 changefeeds with include_initial

Answers (1)

Related Questions