Accessing parent data for each DocumentReference in collection group query

I have a collection of companies and inside of each company document, I have a collection of appointments. I want to loop through all appointments of all companies in a cloud function, so I am using the following collection group query:

db.collectionGroup('appointments')
    .get()
    .then((querySnapshot: any) => {
        querySnapshot.forEach((appointmentDoc: any) => {
            const appointment: Appointment = appointmentDoc.data();
            appointmentDoc.ref.parent.parent.get().then((companyDoc: any) => {
                const company: Company = companyDoc.data();
                ...
            });
        });
     });

As you can see, in each iteration, I am also getting data for the company that the appointment came from. This works, but I'm concerned about performance. If I have 500 appointments, then isn't this method basically making 501 calls to the database (1 for the appointments and then getting the company data for all 500 appointments)? Is there a better way I can access that parent data so I'm not making all those extra calls? Would be great if I can do this in a way that scales.

Upvotes: 0

Answers (4)

LeadDreamer

Reputation: 3499

I use a very hierarchical structure, which would look like it would have similar problems, BUT...

...with a NoSQL database like Firestore, you have to DROP the SQL mantra of DRY. If the data is static (for example, whatever "company" data you actually need for an appointment), you absolutely can and should COPY THAT DATA.

For example, you could quite trivially add to the appointment document the structure:

appointmentSchema = {
  ....
  ....
  company: {
    id: {string},
    name: {string},
    location: {string}
  }
}

Yes, this uses storage. So? Firestore mostly doesn't charge for this small amount of extra storage, and it does charge to fetch a new copy. Since this data isn't dynamically changing, it's much more efficient to add it to the appointment document when it is created.

document fetch should be reserved for dynamic data.

Upvotes: 1

LeadDreamer

Reputation: 3499

Another point: the refPath of a document is a string representing the fully-qualified '/' separated path to the document:

root/topcollection/topdocumentId/nextcollection/nextdocumentId/bottomcollection/bottomdocumentId

...and you can directly parse this string to find collection names and documentId's anywhere up the path to the document. I use this quite a bit as well.

Upvotes: 0

Doug Stevenson

Reputation: 317352

Firestore doesn't actually bill you based on number of queries. It's based on number of document reads. So, if you have 500 appointments, your code is going to read 1000 documents, since it's reading a company document once for each appointment document.

What you can do instead is only read each company document just once total, not once for each appointment for that company. You can maintain a cache in memory for that, using something like this:

// cache of companies identified by their document ID
const companies: { [key: string]: Company } = {}

db.collectionGroup('appointments')
    .get()
    .then((querySnapshot: any) => {
        querySnapshot.forEach((appointmentDoc: any) => {
            const appointment: Appointment = appointmentDoc.data();
            const parentRef = appointmentDoc.ref.parent.parent
            const companyId = parentRef.id
            let company: Company
            if (companies[companyId]) {
                company = companies[companyId]
                // work with cached company here
            }
            else {
                parentRef.get().then((companyDoc: any) => {
                    company: Company = companyDoc.data();
                    companies[companyId] = company
                    // work with queried company here
                });
            }
        });
     });

Although this is incomplete, because the inner query is still asynchronous and will continue to query companies as fast as the appointment iterator can run. You will have to serialize the inner query somehow, or group the appointments by company ID and iterate the groups so that you don't fetch a company document more than once.

But I hope you get the idea here that using a memory cache can save you document reads.

Upvotes: 1

Frank van Puffelen

Reputation: 598708

There is no way to get the parent documents at the same time as the documents from the appointments collection.

The only thing you can do is gather the document IDs into batches of 10 and then doing an IN query with them. But I doubt it's worth the effort, because the wire traffic is likely pretty much the same.

Note that performance does not usually correlate linearly with the number of calls though, so test before trying to optimize it. Also see Google Firestore - how to get document by multiple ids in one round trip?.

Also: do consider why you need 500 documents at once. You'll typically want to load a screenful of data, and this seems a lot more. For general hints about data modeling in Firestore, I recommend the first bunch of episodes of Getting to know Cloud Firestore.

Upvotes: 1

Accessing parent data for each DocumentReference in collection group query

Answers (4)

Related Questions