Firebase Cloud functions to avoid client side work is always worth?

Question

I'm recently working with Firebase Cloud functions to delegate lot of work from my client side to the server, reducing the data cost for the user. But recently I wondered if it's worth it or not, or maybe a better database structure could fix it.

I have a social app where the user can workout and post their results, you can follow users and all kind of "typical" social media stuffs. Well, my problem appear when I want to implement pagination retrieving the last X workouts that I should show to each user on their feed.

My question is : How expensive could be update from 1-1000(worst case) fields on the database on a common event trigger on Firebase Cloud functions. It's enough expensive at client side to look for avoid it and look for better ways talking about performance even if it's more expensive at client side?

I will explain it looking at my example:

Database Structure

"privateUserData" : {
    "user1" : {
      "messagingTokens": {
        "someToken": true,
        "someToken2": true,
      },
      "accountCreationDate" : 1495819217216,        
      "email" : "abcd@gmail.com",
      "followedBy" : {
        "user2": true,
        "user3": true,
      },
      "following" : {
        "user2": true,
        "user3": true,
      },        
      "lastLogin" : 1498654134543,
      "photoUrl" : "photo.png",       
      "username" : "Francisco Durdin Garcia"
    },
  },
  "publicUserData": {
    "user1": {
      "username": "someUserName",
      "followersCount": 5,
      "followingCount": 1,
      "photoUrl" : "someUrl"
    }
    ...
  },
  "workouts" : {
    "workout1" : {
      "likes": {
        "user1": true,
        "user2": true,
        ...
      },
      "followers": {
        "user1": true,
        "user2": true,
        ...
      },
      "comments": {
        "comment1": {
          "owner": "user1",
          "content": "somecomment",
          "time": 1493153530311,
          "replies": {
            "reply1": {
              "owner": "user1",
              "content": "somecomment",
              "time": 1493153530311,
            }
          }
        }
      }
      "authorUid" : "user1",
      "description" : "desc",
      "points" : 63,
      "time" : "00:03",
      "createdAt" : 1493153530311,
      "title" : "someTitle",
      "workoutJson" : "workoutJsonDataHere"
    }
  }

To be able to do that query I should do individual queries for each user I follow:

The problem is that I can do a "global" query and limit it to just X dataSnapshots. I can just filter few workouts for each individual query:

mDatabase.child("workouts").orderByChild("authorId).equalTo("userIFollow").limitToLast(10)

This query will return me a filter applied just for one userIFollow it's not possible to do it over all of them, so I have three options:

1. Create a table which stores relation between usersId and workoutsId visible by them with a timeStamp value. But I should keep track of this values thought a Firebase cloud function, and obviously maybe I follow an user with Thousands of workouts I my cloud functions would need to copy ALL OF THEM to the right reference.

This was the way I wanted to go, but I don't know if it's the proper way talking about client side cost.

2. I can add a lastActivityTimeStamp on publicUserData and filtering by that retrieve just a few workouts of the last users with activity, growing this query with a pagination too.

3. Finally I can always retrieve all the workouts from this user and filter on client side, this will be expensive just one, because later the cache will do everything easier.

This are the ways I found to resolve my problem, and my question is still how expensive and useful are Firebase Cloud functions to copy large amounts of data with common triggers.

samthecodingman · Accepted Answer

From the way you worded your question, you seem familiar with the Database Cloud Functions for Firebase and it also seems that 'workouts' is your payload (the biggest chunk of data that you don't want to download repeatedly).

I would recommend the following approach based roughly off how GitHub's API works.

Prerequisites

In your /privateUserData/{user} data, you seem to have the list of followed user IDs (at /privateUserData/{user}/following). To make your queries simpler, I'd recommend implementing a list of workout IDs authored by that user (under something like /publicUserData/{user}/authorOf).

Implementation

I'd recommend building a HTTP Cloud Function, at say https://FUNCTION_URL/followedWorkouts. When called you would generate a list of workout IDs for a given user by checking who they follow and then getting the list of workouts authored by each followed user and return them as one array. To identify the user, you could pass in their ID using a GET parameter such as ?user= or through some form of authentication. How you go about it is up to you.

The function should return data in the following (or similar) format (in this case I'm using JSON):

[{"id": "workoutId1", "lastMod": "1493153530311"}, {"id": "workoutId2", "lastMod": "1493153530521"}, ...]

id is the the workout ID.
lastMod (short form of last modified) is the last time that workout's data was updated (from {workoutId}/lastModificationDate). See the 'caching' section below.

Filtering

I'd also implement the following 'filters' on the Cloud Function:

Since (?since=): will return workout IDs that have been modified since that timestamp. (Say you downloaded some information at some time, T, you would then set since=T to then only receive workouts changed after that time.
Max (?max=X): will return the X most recent entries.
Start At (?startAt=X): will return the most recent entries starting at the index X (I'd make it a 1-based index).

So if you wanted to grab the 10 most recent entries, you could call https://FUNCTION_URL/followedWorkouts?max=10 which would give you the IDs for the 1st-10th most recently updated workouts. For the next 'page' of entries, you would call https://FUNCTION_URL/followedWorkouts?startAt=10&max=10 which would give you the 11th-20th most recently updated workout IDs.

Caching

As each workout is a payload, it doesn't make sense to download the multiple times. I would recommend caching this data to prevent this. In the response I suggested above, the field lastMod (last modified) allows you to check if a locally cached version needs updating. How you go about this, is yet again up to you.

Extending

If you need more of these paginated feeds, you could name the function more generally such as https://FUNCTION_URL/feeds and pass in the feed type as a parameter https://FUNCTION_URL/feeds?type=workouts. You could use this for things like followers, following, comments, etc.

Feel free to reach out if you need some more information.