How to model complex relational data in Firestore while limiting composite indexes?

Question

First of all thank you to anybody reading through this and offering any advice and help. It is much appreciated.

I'm developing a small custom CRM (ouch) for my father's business (specialty contractor) and I'm using Firestore for my database. It is supposed to be very lean, with not much "bling" but stream lined to his speciality contracting business, which is very hard to to get any other custom CRM to be applied to his process. I have gotten quite far and have a decent size implementation, but am now running into some very fundamental issues as everything is expanding.

I admit that only having experience with relational databases (and not much of that either) left me scratching my head a few times when properly setting up my database structure and am running into some issues with Firestore. I'm also a fairly novice developer and I feel I'm tackling something that is just way out of my league. (but there's not much turning around now being a year into this journey)

As of right now I'm using Top Level Collections for what I am presenting here. I recently started using Sub-Collections for some other minor features and started questioning if I should apply that for everything. A big problem that I foresee is because I want to query in a multitude of ways, I am already consuming almost 100 composite indexes at this time. There is still lots to add, so I need to reduce the amount of composite indexes that my current and future data structure needs.

So I am somewhat certain, that my data model probably is deeply flawed and needs to be improved/optimized/changed. (Which I don't mind doing, if that's what it takes, but I'm lost on "how") I don't need a specific solution, but maybe just some pointers, generally speaking, of what approaches are available. I think I might be lacking an "aha" moment. If I understand a pattern, I can usually apply that further in other areas.

I will make my "Sales Leads Collection" a central concern of this post, as it has the most variations of querying.

So I have a mostly top level collection structure like this, but also want to prefix, that besides writing the IDs to other Documents, I will "stash" an entire "Customer" or "Sales Rep" Object/Document with other Documents and I have Cloud Functions that will iterate through certain documents when there are updates, etc. (To avoid extra reads, i.e. when I read a SalesLead, I don't need to read the SalesRep and Customer Document, as they are also stashed/nested with the SalesLead)

| /sales_reps        //SalesReps Collection
|   /docId          //Document ID
|   + salesRepId (document id)
|   + firstName
|   + lastName
|   + other employee/salesRep related info etc.

| /customers            //Customers Collection
|   /docId              //Document ID
|   + customerId (document id)
|   + firstName
|   + lastName
|   + address + other customer specific related info such as contact info (phone, email) etc.

Logically Sales Leads are of course linked to a Customer (one to many, one Customer can have many leads). All the Fields mentioned below I need to be able to "query" and "filter"

| /sales_leads          //SalesLeads Collection
|   /docId              //Document ID
|   + customerId (document id) <- this is what I would query by to look for leads for a specific customer
|   + salesRepId (document id) <- this is what I would query by to look for leads for a specific sales Rep
|   + status <- (String: "Open", "Sold", "Lost", "On Hold)
|   + progress <- (String: "Started", "Appointment scheduled", "Estimates created", etc. etc., )
|   + type <- (String: New Construction or Service/Repair)
|   + jobTye <- (String: Different Types job Jobs related to what type of structures they are; 8-10 types right now)
|   + reference <- (String: How the lead was referred to the company, i.e. Facebook, Google, etc. etc. );
|   + many other (non queryable) data related to a lead, but not relevant here...

SalesEstimates are related to Leads in a one to many relationship. (one lead can have many estimates) But Estimates are not all that relevant for this discussion, but just wanted to include it anyhow. I query and filter estimates in a very similar way I do with leads, though. (similar fields etc.)

| /sales_estimates      //SalesEstimates Collection
|   /docId              //Document ID
|   + salesLeadId (document id) <- this is what I would query by to look for estimates for a specific lead
|   + customerId (document id) <- this is what I would query by to look for estimates for a specific customer
|   + salesRepId (document id) <- this is what I would query by to look for estimates for a specific sales Rep
|   + specific sales Lead related data etc....

In my "Sales Lead List" on the client, I have some Drop Down Boxes as Filters, that contain Values (i.e. Sales Reps) but also haven an Option/Value "All" to negate any filtering.

So I would start assembling a query:

Query query = db.collection("sales_leads");

//Rep
if (!salesRepFilter.equals("All")) { //Typically only Managers/Supervisors woujld be able to see "all leads" whereas for a SalesRep this would be set on his own ID by default.
    query = query = query.whereEqualTo("salesRepId", salesRepId);  
}
//Lead Status (Open, Sold, Lost, On Hold)
if (!statusFilter.contains("All")) {
    query = query.whereEqualTo("status", statusFilter);
}
//Lead Progress
if (!progressFilter.contains("All")) {
    query = query.whereEqualTo("progress", progressFilter);
}
//Lead Type
if (!typeFilter.contains("All")) {
    query = query.whereEqualTo("leadType", typeFilter);
}
//Job Type
if (!jobTypeFilter.contains("All")) {
    query = query.whereArrayContains("jobTypes", jobTypeFilter);
}
//Reference
if (!referenceFilter.contains("All")) {
    query = query.whereEqualTo("reference", referenceFilter);
}

Additionally I might want to reduce the whole query to a single customer (this typically means that all other filters are skipped and "all leads for this customer are shown). This would happen if the user opens the Customer Page/Details and clicks on something like "Show Leads for this customer".

//Filter by Customer (when entering my SalesLead List from a Customer Card/Page where user clicked on "Show Leads for this Customer")
if (filterByCustomer) {
    query = query.whereEqualTo("customerId", customerFilter);
}

//And at last I want to be able to query the date Range (when the lead was created) and also sort by "oldest" or "newest"

//Date Range
query = query.whereGreaterThan("leadCreatedOnDate", filterFromDate); 
        .whereLessThan("leadCreatedOnDate", filterToDate;

//Sort Newest vs Oldest

if (sortByNewest) { //either newest or oldest
    query = query.orderBy("leadCreatedOnDate", Query.Direction.ASCENDING);
} else {
    query = query.orderBy("leadCreatedOnDate", Query.Direction.DESCENDING);
}

And that would complete my query on sales leads. Which that all works great right now but I am anxious about going forward and ultimately hitting the composite index limitation. I don't have an exact number, but I am probably entertaining 25-30 composite indexes just for my collection of sales_leads. (Yikes!)

Not only are there many fields to query by, the amount of composite indexes required is multiplied by the combination of possible filters set. (UGH)

I need to be able to query all leads and then filter them by the fields mentioned above (when describing my sales_leads collection).

So instead of keeping all these collections as top level collections I am guessing that somehow I should restructure my database by entertaining sub collections, but I tried modeling this with different approaches and always seem to hit a wall.

I suppose I could have "sales_leads" as a subcollection under each customer object and could use a collection group query to retrieve "all leads", but those require composite indexes, too right? So it would just be tradeoff for that one searchable field. (..hits wall..)

Sorry for the length. I hope it's readable. I appreciate any help, feedback and input. I'm in a very anxious and frustrated position.

If this doesn't work, I might need to consider professional consultation.

Thanks!

slushy · Accepted Answer

Here are a few things I think will help you.

First, watch the AWS re:Invent 2018: Amazon DynamoDB Deep Dive on YouTube. It's about DynamoDB but DynamoDB is a NoSQL database very similar to Firestore and the concepts universally apply. Midway through the video, Rick uses a company like yours as an example and you may be surprised to see how effectively he can reduce query count simply through data modeling.

Second, familiarize yourself with Firestore's index merging. In situations like yours, it may be better to manually create your composite indices, or at least manually audit them, because Firestore's automatic indexing doesn't guarantee the most efficient menu of composite indices. Remember, composite indices are automatically created based on the order you execute queries and if you execute a query later that could be better structured by voiding a previous index, Firestore will not go back and delete it for you—you have to.

I'm highly suspicious of the fact that the sales-lead query consumes 25-30 composite indices; that number seems far too high to me given how many fields in the documents are indexed. Before you do anything—after having watched the video and studied index merging, of course—I'd focus entirely on this collection. You must be completely certain of the maximum number of composite indices this collection needs to consume. Perhaps create a dummy collection and experiment with index merging and really understand how it works because this alone may solve all of your problems. I would be shocked if Firestore couldn't handle your company's use case.

Third, don't be afraid to denormalize your data. The fundamental premise of NoSQL is really denormalization—that is, data storage really should be your least concern and computation/operation really should be your greatest concern. If you can reduce your query count by duplicating data over multiple documents in multiple collections, that is simply what you must do if the alternative is hitting 200 composite indices.

How to model complex relational data in Firestore while limiting composite indexes?

Answers (1)

Related Questions