Reputation: 5049
I have over 20k objects in my Firebase Realtime Database. I now need to take out all these objects and do stuff to them. The problem is the server runs out of memory every time I do it. This is my current code:
sendEmail.get('/:types/:message', cors(), async (req, res, next) => {
console.log(5);
const types = JSON.parse(req.params.types);
console.log('types', types);
let recipients = [];
let mails = [];
if (types.includes('students')) {
console.log(1);
const tmpUsers = await admin.database().ref('Users').orderByChild('student').equalTo(true).once('value').then(r => r.val()).catch(e => console.log(e));
recipients = recipients.concat(tmpUsers);
}
if (types.includes('solvers')) {
console.log(2);
let tmpUsers = await admin.database().ref('Users').orderByChild('userType').equalTo('person').once('value').then(r => r.val()).catch(e => console.log(e));
tmpUsers = tmpUsers.concat(arrayFromObject(await admin.database().ref('Users').orderByChild('userType').equalTo('company').once('value').then(r => r.val()).catch(e => console.log(e))));
recipients = recipients.concat(tmpUsers);
}
});
So I have two options. Streaming or limiting the response with startAt
and endAt
. But to limit the responses I need to know how many objects exactly I have. And to do this I need to download the whole collection... You see my problem now. How can I learn how many documents I have, without downloading the whole collection?
Upvotes: 1
Views: 1122
Reputation: 6926
You could try paginating your query by combining limitToFirst
/limitToLast
and startAt
/endAt
.
For example, you could perform the first query with limitToFirst(1000)
, then obtain the last key from this returned list and use that with startAt(key)
and another limitToFirst(1000)
, repeating until you reach the end of the collection.
In node.js, it might look something like this (untested code):
let recipients = [];
let tmpUsers = next();
recipients = filter(recipients, tmpUsers);
// startAt is inclusive, so when this reaches the last result there will only be 1
while (tmpUsers.length>1) {
let lastKey = tmpUsers.slice(-1).pop().key;
tmpUsers = next(lastKey);
if (tmpUsers.length>1) { // Avoid duplicating last result
recipients = filter(recipients, tmpUsers);
}
}
async function next(startAt) {
if (!startAt) {
return await admin.database().ref('Users')
.orderByKey()
.limitToFirst(1000)
.once('value').then(r => r.val()).catch(e => console.log(e));
} else {
return await admin.database().ref('Users')
.orderByKey()
.startAt(startAt)
.limitToFirst(1000)
.once('value').then(r => r.val()).catch(e => console.log(e));
}
}
function filter(array1, array2) {
// TODO: Filter the results here as we can't combine orderByChild/orderByKey
return array1.concat(array2);
}
The problem with this is that you won't be able to use database-side filtering, so you'd need to filter the results manually, which might make things worse, depending on how many items you need to keep in the recipients
variable at a time.
Another option would be to process them in batches (of 1000 for example), pop them from the recipients
array to free up resources and then move onto the next batch. It does depend entirely on what actions you need to perform on the objects, and you'll need to weigh up whether it's actually necessary to process (and keep in memory) the entire result set in one go.
Upvotes: 3
Reputation: 6364
You don't need to know the size of the collection to process them by batch.
You can do it by ordering them by key, limiting to 1000 or so, and then on next batch start the last key of the first batch.
If you still want to know how to get the size of the collection, the only good way is to maintain the size of collection in separate node and keep it updated when the collection is updated.
Upvotes: 2