Reputation: 4941
I have a lambda function that writes metrics to Cloudwatch. While, it writes metrics, It generates some logs in a log-group.
INFO:: username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local
INFO:: username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local
INFO:: username: [email protected] ClinicID: 7668 nodename: MacBook-Pro-2.local
INFO:: username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local
I would like to query AWS logs in past x
hours where x could be anywhere between 12 to 24 hours, based on any of the params.
For ex:
ClinicID=7667
or
ClinicID=7667
and username='[email protected]'
or
username='[email protected]'
I am using boto3
in Python.
Upvotes: 66
Views: 93633
Reputation: 1390
I prefer client.filter_log_events from boto3.
# your function to parse the event.
def parse_event(event):
pass
# my solution:
def grab_data(aws_region,
aws_access_key_id,
aws_secret_access_key,
log_group_name,
list_log_stream,
start_time,
end_time,
filter_pattern):
"""
Fetch log events from AWS CloudWatch Logs.
Args
----
aws_region (str): AWS region.
aws_access_key_id (str): AWS access key ID.
aws_secret_access_key (str): AWS secret access key.
log_group_name (str): Name of the log group.
list_log_stream (list): List of log stream names.
start_time (int): Start time for log event retrieval in milliseconds.
end_time (int): End time for log event retrieval in milliseconds.
filter_pattern (str): Pattern to filter log events.
Returns
-------
list: List of strings representing parsed log events.
"""
client = boto3.client(
service_name='logs',
region_name=aws_region,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key
)
rows = []
next_token = {}
while True:
response = client.filter_log_events(
logGroupName=log_group_name,
logStreamNames=list_log_stream,
startTime=start_time,
endTime=end_time,
filterPattern=filter_pattern,
**next_token
)
rows += [item for item in
[str(parse_event(event)) for event in response['events']]
if item != 'None']
if 'nextToken' not in response:
break
next_token = {'nextToken': response['nextToken']}
return rows
Upvotes: 0
Reputation: 373
The easiest way is to use awswrangler:
import awswrangler as wr
# must define this for wrangler to work
boto3.setup_default_session(region_name=region)
df = wr.cloudwatch.read_logs(
log_group_names=["loggroup"],
start_time=from_timestamp,
end_time=to_timestamp,
query="fields @timestamp, @message | sort @timestamp desc | limit 5",
)
You can pass a list of the log groups needed, start and end time. The output is a pandas DataFrame containing the results.
FYI, under the hood, awswrangler uses the boto3 commands as in @dejan answer
Upvotes: 5
Reputation: 450
You can achieve this with the cloudWatchlogs client and a little bit of coding. You can also customize the conditions or use JSON module for a precise result.
EDIT
You can use describe_log_streams to get the streams. If you want only the latest, just put limit 1, or if you want more than one, use for loop to iterate all streams while filtering as mentioned below.
import boto3
client = boto3.client('logs')
## For the latest
stream_response = client.describe_log_streams(
logGroupName="/aws/lambda/lambdaFnName", # Can be dynamic
orderBy='LastEventTime', # For the latest events
limit=1 # the last latest event, if you just want one
)
latestlogStreamName = stream_response["logStreams"]["logStreamName"]
response = client.get_log_events(
logGroupName="/aws/lambda/lambdaFnName",
logStreamName=latestlogStreamName,
startTime=12345678,
endTime=12345678,
)
for event in response["events"]:
if event["message"]["ClinicID"] == "7667":
print(event["message"])
elif event["message"]["username"] == "[email protected]":
print(event["message"])
#.
#.
# more if or else conditions
## For more than one Streams, e.g. latest 5
stream_response = client.describe_log_streams(
logGroupName="/aws/lambda/lambdaFnName", # Can be dynamic
orderBy='LastEventTime', # For the latest events
limit=5
)
for log_stream in stream_response["logStreams"]:
latestlogStreamName = log_stream["logStreamName"]
response = client.get_log_events(
logGroupName="/aws/lambda/lambdaFnName",
logStreamName=latestlogStreamName,
startTime=12345678,
endTime=12345678,
)
## For example, you want to search "ClinicID=7667", can be dynamic
for event in response["events"]:
if event["message"]["ClinicID"] == "7667":
print(event["message"])
elif event["message"]["username"] == "[email protected]":
print(event["message"])
#.
#.
# more if or else conditions
Let me know how it goes.
Upvotes: 15
Reputation: 12089
You can get what you want using CloudWatch Logs Insights.
You would use start_query
and get_query_results
APIs: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html
To start a query you would use (for use case 2 from your question, 1 and 3 are similar):
import boto3
from datetime import datetime, timedelta
import time
client = boto3.client('logs')
query = "fields @timestamp, @message | parse @message \"username: * ClinicID: * nodename: *\" as username, ClinicID, nodename | filter ClinicID = 7667 and username='[email protected]'"
log_group = '/aws/lambda/NAME_OF_YOUR_LAMBDA_FUNCTION'
start_query_response = client.start_query(
logGroupName=log_group,
startTime=int((datetime.today() - timedelta(hours=5)).timestamp()),
endTime=int(datetime.now().timestamp()),
queryString=query,
)
query_id = start_query_response['queryId']
response = None
while response == None or response['status'] == 'Running':
print('Waiting for query to complete ...')
time.sleep(1)
response = client.get_query_results(
queryId=query_id
)
Response will contain your data in this format (plus some metadata):
{
'results': [
[
{
'field': '@timestamp',
'value': '2019-12-09 17:07:24.428'
},
{
'field': '@message',
'value': 'username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local\n'
},
{
'field': 'username',
'value': '[email protected]'
},
{
'field': 'ClinicID',
'value': '7667'
},
{
'field': 'nodename',
'value': 'MacBook-Pro-2.local\n'
}
]
]
}
Upvotes: 107
Reputation: 8583
I used awslogs
. if you install it, you can do. --watch
will tail the new logs.
awslogs get /aws/lambda/log-group-1 --start="5h ago" --watch
You can install it using
pip install awslogs
to filter you can do:
awslogs get /aws/lambda/log-group-1 --filter-pattern '"ClinicID=7667"' --start "5h ago" --timestamp
It supports multiple filter patterns as well.
awslogs get /aws/lambda/log-group-1 --filter-pattern '"ClinicID=7667"' --filter-pattern '" [email protected]"' --start "5h ago" --timestamp
References:
Upvotes: 3