Finding the collection length in firebase

Question

I have over 20k objects in my Firebase Realtime Database. I now need to take out all these objects and do stuff to them. The problem is the server runs out of memory every time I do it. This is my current code:

sendEmail.get('/:types/:message', cors(), async (req, res, next) => {
    console.log(5);
    const types = JSON.parse(req.params.types);
    console.log('types', types);
    let recipients = [];
    let mails = [];
    if (types.includes('students')) {
        console.log(1);
        const tmpUsers = await admin.database().ref('Users').orderByChild('student').equalTo(true).once('value').then(r => r.val()).catch(e => console.log(e));
        recipients = recipients.concat(tmpUsers);
    }
    if (types.includes('solvers')) {
        console.log(2);
        let tmpUsers = await admin.database().ref('Users').orderByChild('userType').equalTo('person').once('value').then(r => r.val()).catch(e => console.log(e));
        tmpUsers = tmpUsers.concat(arrayFromObject(await admin.database().ref('Users').orderByChild('userType').equalTo('company').once('value').then(r => r.val()).catch(e => console.log(e))));
        recipients = recipients.concat(tmpUsers);
    }
});

So I have two options. Streaming or limiting the response with startAt and endAt. But to limit the responses I need to know how many objects exactly I have. And to do this I need to download the whole collection... You see my problem now. How can I learn how many documents I have, without downloading the whole collection?

Grimthorr · Accepted Answer

You could try paginating your query by combining limitToFirst/limitToLast and startAt/endAt.

For example, you could perform the first query with limitToFirst(1000), then obtain the last key from this returned list and use that with startAt(key) and another limitToFirst(1000), repeating until you reach the end of the collection.

In node.js, it might look something like this (untested code):

let recipients = [];

let tmpUsers = next();
recipients = filter(recipients, tmpUsers);

// startAt is inclusive, so when this reaches the last result there will only be 1
while (tmpUsers.length>1) {
    let lastKey = tmpUsers.slice(-1).pop().key;
    tmpUsers = next(lastKey);
    if (tmpUsers.length>1) { // Avoid duplicating last result
        recipients = filter(recipients, tmpUsers);
    }
}

async function next(startAt) {
    if (!startAt) {
        return await admin.database().ref('Users')
                .orderByKey()
                .limitToFirst(1000)
                .once('value').then(r => r.val()).catch(e => console.log(e));
    } else {
        return await admin.database().ref('Users')
                .orderByKey()
                .startAt(startAt)
                .limitToFirst(1000)
                .once('value').then(r => r.val()).catch(e => console.log(e));
    }
}

function filter(array1, array2) {
    // TODO: Filter the results here as we can't combine orderByChild/orderByKey
    return array1.concat(array2);
}

The problem with this is that you won't be able to use database-side filtering, so you'd need to filter the results manually, which might make things worse, depending on how many items you need to keep in the recipients variable at a time.

Another option would be to process them in batches (of 1000 for example), pop them from the recipients array to free up resources and then move onto the next batch. It does depend entirely on what actions you need to perform on the objects, and you'll need to weigh up whether it's actually necessary to process (and keep in memory) the entire result set in one go.

Finding the collection length in firebase

Answers (2)

Related Questions