MrPHP
MrPHP

Reputation: 972

NodeJS and Mongo line who's online

TL;DR

logging online users and reporting back a count (based on a mongo find)


We've got a saas app for schools and students, as part of this I've been wanting a 'live' who's online ticker.

Teachers from the schools will see the counter, and the students and parents will trigger it.

I've got a socket.io connect from the web app to a NodeJS app. Where there is lots of traffic, the Node/Mongo servers can't handle it, and rather than trow more resources at it, I figured it's better to optomise the code - because I don't know what I'm doing :D

with each student page load: Create a socket.io connection with the following object:

{
'name': 'student or caregiver name',
'studentID': 123456,
'schoolID': 123,
'role': 'student', // ( or 'mother' or 'father' )
'page': window.location
}

in my NODE script:

io.on('connection', function(client) {
    // if it's a student connection.. 
    if(client.handshake.query.studentID) {
        let student = client.handshake.query; // that student object
            student.online = new Date();
            student.offline = null;
        db.collection('students').updateOne({ 
           "reference": student.schoolID + student.studentID + student.role }, { $set: student 
        }, { upsert: true });


    }

    // IF STAFF::: just show count!
    if(client.handshake.query.staffID) {
      db.collection('students').find({ 'offline': null, 'schoolID':client.handshake.query.schoolID }).count(function(err, students_connected) {
          
          emit('online_users' students_connected);
       });
    }



    client.on('disconnect', function() {
        // then if the students leaves the page..
        if(client.handshake.query.studentID) {
            db.collection('students').updateMany({ "reference": student.reference }, { $set: { "offline": new Date().getTime() } })
            .catch(function(er) {});
         }

         // IF STAFF::: just show updated count!
         if(client.handshake.query.staffID) {
           db.collection('students').find({ 'offline': null, 'schoolID':client.handshake.query.schoolID }).count(function(err, students_connected) {
          
                emit('online_users' students_connected);
            });
         }
     });
});


What Mongo Indexes would you add, would you store online students differently (and in a different collection) to a 'page tracking' type deal like this? (this logs the page and duration so I have another call later that pulls that - but that's not heavily used or causing the issue.

If separately, then insert, then delete?

The EMIT() to staff users, how can I only emit to staff with the same schoolID as the Students?

Thanks!

Upvotes: 5

Views: 703

Answers (2)

Gandalf the White
Gandalf the White

Reputation: 2465

You have given a brief about the issue but no diagnosis on why the issue is happening. Based on a few assumptions I will try to answer your question.

First of all you have mentioned that you'd like suggestions on what Indexes can help your cause, based on what you have mentioned it's a write heavy system and indexes in principle will only slow the writes because on every write the Btree that handles the indexes will have to be updated too. Although the reads become way better specially in case of a huge collection with a lot of data.

So an index can help you a lot if your collection has let's say, 1 million documents. It helps you to skim only the required data without even doing a scan on all data, thanks to the Btree.

And Index should be created specifically based on the read calls you make.

For e.g.

{"student_id" : "studentID", "student_fname" : "Fname"}

If the read call here is based on student_id then create and index on that, and if multiple values are involved (equality - sort or anything) then create a compound index on those fields, giving priority to Equality field first and range and sort fields thereafter.

Now the seconds part of question, what would be better in this scenario.

This is a subjective thing and I'm sure everyone will have a different approach to this. My solution is based on a few assumptions.

Assumption(s)

The system needs to cater to a specific feature where student's online status is updated in some time interval and that data is available for reads for parents, teachers, etc.

The sockets that you are using, if they stay connected continuously all the time then it's that many concurrent connections with the server, if that is required or not, I don't know. But concurrent connections are heavy for the server as you would already know and unless that's needed 100 % try a mixed approach.

If it would be okay for you disconnect for a while or keep connection with the server for only a short interval then please consider that. Which basically means, you disconnect from the server gracefully, connect send data and repeat.

Or, just adopt a heartbeat system where your frontend app will call an API after set time interval and ping the server, based on that you can handle if the student is online or not, a little time delay, yes but easily scaleable.

Please use redis or any other in memory data store for such frequent writes and specially when you don't need to persist the data for long.

For example, let's say we use a redis list for every class / section of user and only update the timestamp (epoch) when their last heartbeat was received from the frontend.

In a class with 60 students, sort the students based on student_id or something like that.

Create a list for that class

For student_id which is the first in ascended student's list, update the epoch like this

LSET mylist 0 "1266126162661" //Epoch Time Stamp 

0 is your first student and 59 is our 60th student, update it on every heartbeat. Either via API or the same socket system you have. Depends on your use case.

When a read call is needed

LRANGE classname/listname 0 59

Now you have epochs of all users, maintain the list of students either via database or another list where you can simply match the indexes with a specific student.

LSET studentList 0 "student_id" //Student id of the student or any other data, I am trying to explain the logic

On frontend when you have the epochs take the latest epoch in account and based on your use case, for e.g. let's say I want a student to be online if the hearbeat was received 5 minutes back.

Current Timestamp - Timestamp (If less than 5 minutes (in seconds)) then online or else offline.

Upvotes: 4

MattM
MattM

Reputation: 1289

This won't be a complete answer without discussing the problem some more, but figured I'd post some general suggestions.

First, we should figure out where the performance bottlenecks are. Is it a particular query? Is it too many simultaneous connections to MongoDB? Is it even just too much round trip time per query (if the two servers aren't within the same data center)? There's quite a bit to narrow down here. How many documents are in the collection? How much RAM does the MongoDB server have access to? This will give us an idea of whether you should be having scaling issues at this point. I can edit my answer later once we have more information about the problem.

Based on what we know currently, without making any model changes, you could consider indexing the reference field in order to make the upsert call faster (if that's the bottleneck). That could look something like:

db.collection('students').createIndex({
  "reference": 1
},
{ background: true });

If the querying is the bottleneck, you could create an index like:

db.collection('students').createIndex({
  "schoolID": 1
},
{ background: true });

I'm not confident (without knowing more about the data) that including offline in the index would help, because optimizing for "not null" can be tricky. Depending on the data, that may lead to storing the data differently (like you suggested).

Upvotes: 1

Related Questions