Ganesh Rathinavel
Ganesh Rathinavel

Reputation: 1335

What platform/tech stack can help achieve seamless distributed logging and tracing for my system?

I’m looking for recommendations on platforms or tech stacks that can help us achieve robust distributed logging and tracing for our platform. Here's an overview of our system and requirements:

Our Platform

We have a distributed system with the following interconnected components:

  1. Web App built using Next.js:
    • Frontend: React
    • Backend: Node.js
  2. REST API Server using FastAPI.
  3. Python Library that runs on client machines and interacts with the REST API server.

What We Want to Achieve

When users report issues, we need a setup that can:

For example, if a user encounters a REST API error while using our Python library, we want to trace the entire flow of that request across the Python library, REST API server, and any related services.

Specific Questions

  1. Tracking User Actions Across the Platform

    • Are there any tools or platforms that can trace a user’s journey or timeline of activities across multiple applications?
    • Can these tools link logs/errors to an individual user’s actions across the system?
  2. Handling Guest Users and Identity Mapping

    • Many users interact as guests (anonymous) before authenticating. Is there a way to associate logs/errors from their guest activities to their identity once they log in, so all their past and future actions are unified under a single identity?
  3. Unifying Logs Across the Platform
    Here’s an example scenario we’re looking to address:

    • A user runs code in the Python library, and we log their actions.
    • The library prompts them to log in, and we log this event as well.
    • During login, they hit a REST API endpoint, but the login fails (e.g., an authentication error). Logs are captured from both the library and the API server.
    • Upon successful login, we assign an identity to the user and tag all their future logs with it.
    • The user uploads a file via the Python library, but the REST API server throws an error. Logs from both the library and the API server should be correlated.
    • Later, if the user reports an issue, we want to trace their actions across the entire platform to identify the root cause.
  4. Filtering Logs for Troubleshooting

    • Are there solutions that allow filtering logs/traces for a specific user or session to recreate the sequence of events leading to an issue?

What We Are Considering

Are there platforms, open-source tools, or tech stack setups (commercial or otherwise) that you’d recommend for this?

We’re essentially looking for a distributed logging and tracing solution that can help us achieve this level of traceability and troubleshooting across our platform.

Would love to hear about what has worked for you or any recommendations you might have!

Thanks in advance!

Upvotes: 0

Views: 14

Answers (0)

Related Questions