Marcelo Glasberg
Marcelo Glasberg

Reputation: 30909

Firestore chat-app: Is this a valid document structure for multi-recipient messages?

Suppose a chat app has 10 million Firebase users, and hundreds of millions of messages.

I have a Firestore collection containing messages represented as documents in a time-series, and each of these messages may be received and viewed by up to 100 of these users. Please note, these users are not organized in stable groups, since each message may have a completely different set of users that receive it.

I need to be able to find, very efficiently (in terms of time and cost), all messages after some specific time, directed to some specific user.

My first failed attempt would be to list the recipient users in a recipients array field, for example:

sender: user3567381
dateTime : 2019-01-24T20:37:28Z
recipients : [user1033029, user9273842, user8293413, user6273581]

However, that will not allow me to do my queries efficiently.

As a second failed attempt, since Firestore is schemaless, I thought about making each user a field, like this:

sender: user3567381
dateTime : 2019-01-24T20:37:28Z
user1033029 : true
user9273842 : true
user8293413 : true
user6273581 : true

Then, for example, if I want to know all messages for user 8293413 after 3:00 PM today, I could do it like this:

messages.where("user8293413", "==", true).where("dateTime", ">=", "2019-01-24T15:00:00Z")

This is a composite-index query, and it would need one index per user. Unfortunatelly, there is a limitation of 200 composite-indexes per database.

To solve this, my current attempt is to turn the date into values of the user fields, like this:

sender: user3567381
dateTime : 2019-01-24T20:37:28Z
user1033029 : 2019-01-24T20:37:28Z
user9273842 : 2019-01-24T20:37:28Z
user8293413 : 2019-01-24T20:37:28Z
user6273581 : 2019-01-24T20:37:28Z

Now, if I want to know all messages for user 8293413 after 3:00 PM today, I could do it like this:

messages.where("user8293413", ">=", "2019-01-24T15:00:00Z")

Note this is now a single-field index.

From the documentation I know that Firestore will create single-field indexes for all fields, so it means it will create indexes for user8293413 in specific. This means the search will be fast, right? And that the number of reads will be kept to a minimum (one read per message).

However, since I have 10 million users, Firestore will have to create 10 million single-field indexes (assuming all users receive messages) for the entire database.

From the documentation Firestore has these limitations:

By reading the above, these call my attention:

However, they state that the limitation is for each document, not for each database. And I only have millions of indexes for the database, not for each document.

Is that a problem? Will that many indexes affect performance? How about the storage cost of all these indexes? Is Firebase prepared at all for a large total number of indexes per database?

Upvotes: 6

Views: 1014

Answers (1)

Thingamajig
Thingamajig

Reputation: 4465

Although many months later, for any future users, it does seem like the first attempt would likely work the best.

Using a single static field for timestamp and a single static field for recipients means index will remain negligible and you won't have to think about them.

To find all messages for a user, which seems as though it's your goal here:

For example, if I want to know all messages for user 8293413 after 3:00 PM today, I could do it like this:

This would simply look like this in pseudocode:

firestore.collection('messages').where('recipient', 'array_contains', userId).where('time', '>', '3pm today'.get()

This should be easy enough on performance, Firebase is optimized for the operators it provides, e.g. '==', '>=', 'array_contains'

Upvotes: 1

Related Questions