st.huber
st.huber

Reputation: 1545

Show distinct number of users invoking API Gateway in CloudWatch dashboard

How can I get the distinct number of users for a given time range that have used my service? The number of users must be shown in a CloudWatch dashboard.

I am using Cognito with a hosted UI for user authentication, HTTP API Gateway, with Lambda integration for authorization and the API Gateway requests get handled by another Lambda function.

In the CloudWatch access logs for the API Gateway, I can log the username. I know that I can use stats count(*) by username in CloudWatch Insights to get a count of how many requests each user has sent to the API Gateway but I don't know how I can get a list of distinct users. The count_distinct won't work as it will only approximate the users as the field can have high cardinality.

In the end, I want to have a number widget in my CloudWatch dashboard that will show the distinct number of users who have used the service within the selected time range.

Upvotes: 0

Views: 535

Answers (1)

st.huber
st.huber

Reputation: 1545

Using custom CloudWatch dashboard widgets I've decided to build a Lambda function that executes a log insights query and renders the results as a custom widget.

User icon and number as custom CloudWatch dashboard widget

import os

import boto3
from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.data_classes import (
    CloudWatchDashboardCustomWidgetEvent,
    event_source,
)
from aws_lambda_powertools.utilities.typing import LambdaContext

LOG_GROUP_NAME = os.environ["LOG_GROUP_NAME"]

logger = Logger()
cloud_watch_logs = boto3.client("logs")

DOCS = """
## User Widget
A script to get the number of unique users accessing the API in a given time range.
"""

CSS = """
<style>
.container {
    align-content: center;
    align-items: center;
    display: flex;
    flex-direction: row;
    justify-content: center;
    width: 100%;
}

.value {
    font-size: 45px;
}
</style>"""


def get_unique_api_users(start_time: int, end_time: int) -> int:
    start_query_response = cloud_watch_logs.start_query(
        logGroupName=LOG_GROUP_NAME,
        startTime=start_time,
        endTime=end_time,
        queryString='filter ispresent(user) and user != "-" | stats count(*) as userCount by user',
        limit=10000,
    )

    response = None
    while response == None or response["status"] != "Complete":
        response = cloud_watch_logs.get_query_results(
            queryId=start_query_response["queryId"]
        )

    return len(response["results"])


@logger.inject_lambda_context(log_event=False)
@event_source(data_class=CloudWatchDashboardCustomWidgetEvent)
def lambda_handler(event: CloudWatchDashboardCustomWidgetEvent, context: LambdaContext):
    if event.describe:
        return DOCS

    start_time = event.widget_context.time_range.start
    end_time = event.widget_context.time_range.end
    if event.widget_context.time_range.zoom_start:
        start_time = event.widget_context.time_range.zoom_start
        end_time = event.widget_context.time_range.zoom_end

    return f"""
{CSS}
<div class="container">
    <div class="value">
        🧑 {get_unique_api_users(start_time=start_time, end_time=end_time)}
    </div>
</div>"""

With this approach we ensure to get the exact number of API users. On the downside, getting the number of users will take longer the more logs we query and the more users we have. Also, everytime we refresh the widget a Lambda function gets invoked, counting towards our concurrent execution limit in the region and costing money on each invocation, though it's arguably only very little money.

Upvotes: 1

Related Questions