Reputation: 23
I have a large JSON format log file that has fields StartDate
, StartTime
, and each log entry ends with EndDate
and EndTime
.
My sample Input Log file entry with 4 lines are below. My log file consists of entries for days of data.
{ "Utility":"DBUpdate", "StartDate":"2020-09-21", "StartTime":"14:41:12", "Server":"eaidev", "Userid":"sx50067", "TrueExit":"No", "WaitInterval":30, "Cluster":"1", "Source":"MANNING1", "Target":"MANNING2", "ClusterListCt":5, "ListCt":55, "RequestServer":"MANNING3", "Reply":"JOT4", "ISC(Source)":0, "EndDate":"2020-09-21", "EndTime":"14:41:21", "ExitCode":0 }
{ "Utility":"DBUpdate", "StartDate":"2020-09-22", "StartTime":"14:41:12", "Server":"eaidev", "Userid":"sx50067", "TrueExit":"No", "WaitInterval":30, "Cluster":"1", "Source":"MANNING1", "Target":"MANNING2", "ClusterListCt":5, "ListCt":55, "RequestServer":"MANNING3", "Reply":"JOT4", "ISC(Source)":0, "EndDate":"2020-09-22", "EndTime":"14:41:21", "ExitCode":0 }
{ "Utility":"DBUpdate", "StartDate":"2020-09-23", "StartTime":"14:41:12", "Server":"eaidev", "Userid":"sx50067", "TrueExit":"No", "WaitInterval":30, "Cluster":"1", "Source":"MANNING1", "Target":"MANNING2", "ClusterListCt":5, "ListCt":55, "RequestServer":"MANNING3", "Reply":"JOT4", "ISC(Source)":0, "EndDate":"2020-09-23", "EndTime":"14:41:29", "ExitCode":0 }
{ "Utility":"DBUpdate", "StartDate":"2020-09-23", "StartTime":"14:42:12", "Server":"eaidev", "Userid":"sx50067", "TrueExit":"No", "WaitInterval":30, "Cluster":"1", "Source":"MANNING1", "Target":"MANNING2", "ClusterListCt":5, "ListCt":55, "RequestServer":"MANNING3", "Reply":"JOT4", "ISC(Source)":0, "EndDate":"2020-09-23", "EndTime":"14:43:21", "ExitCode":0 }
In a separate script, I run a job and I capture Start Date and Time and I also capture my End date and End Time into a Temp file like below.
2020-09-23 14:41:12
2020-09-23 14:43:21
I am using variables like the below in my script to capture these times.
DATETIME=$(date '+%Y-%m-%d %T')
DATE=$(echo "${END_DATETIME}" | cut -f1 -d' ')
TIME=$(echo "${END_DATETIME}" | cut -f2 -d' ')
Using my input file data which has a start and end date times of my program, I want to capture all the logfile lines in between my Start Time and End Time and write it to a file.
I expect my new log file to be like this:
{ "Utility":"DBUpdate", "StartDate":"2020-09-23", "StartTime":"14:41:12", "Server":"eaidev", "Userid":"sx50067", "TrueExit":"No", "WaitInterval":30, "Cluster":"1", "Source":"MANNING1", "Target":"MANNING2", "ClusterListCt":5, "ListCt":55, "RequestServer":"MANNING3", "Reply":"JOT4", "ISC(Source)":0, "EndDate":"2020-09-23", "EndTime":"14:41:29", "ExitCode":0 }
{ "Utility":"DBUpdate", "StartDate":"2020-09-23", "StartTime":"14:42:12", "Server":"eaidev", "Userid":"sx50067", "TrueExit":"No", "WaitInterval":30, "Cluster":"1", "Source":"MANNING1", "Target":"MANNING2", "ClusterListCt":5, "ListCt":55, "RequestServer":"MANNING3", "Reply":"JOT4", "ISC(Source)":0, "EndDate":"2020-09-23", "EndTime":"14:43:21", "ExitCode":0 }
I am able to capture logs based on date but when it comes to time, I am getting more than what I want. Can you please suggest?
Upvotes: 2
Views: 267
Reputation: 5975
Your data have an obvious pitfall, the use of two different fields (StartDate
and StartTime
) instead of ONE field, the "datetime", which is standard and well-known across programming languages and data types. If you want to compare dates, then you have to compare combinations of these fields.
Furthermore, if you have to consider more things about these dates, like timezones or daylight saving periods, this structure becomes more frustrating for no reason.
Another note: Here it seems that you use JSON but you treat it as text file, with one record per line, JSON isn't necessarily printed like this, or could have characters in places where they will break a simple text parsing based on column positions or pattern matching.
In general, to filter your json and get only those with a field value inside a range:
jq 'select(.StartDate > "2000-09-22" and .StartDate < "2020-09-24")' file.json
You can pass bash variables to the above like this:
#!/bin/bash
start_date="2020-09-22"
end_date="2020-09-24"
jq -c --arg s "$start_date" \
--arg e "$end_date" \
'select(.StartDate > $s and .StartDate < $e)' file.json
I have also added -c
to print records one per line, because I think you really want this. Now, you can add any variables, any conditions for StartDate, StartTime, and get what you want.
Also, here is a simple way to concatenate {Start|End}{Date|Time}
of your JSON into easily sortable datetime fields.
jq -c '.StartDate = "\(.StartDate)_\(.StartTime)"
| .EndDate = "\(.EndDate)_\(.EndTime)"
| del(.StartTime, .EndTime)' file.json
So you will not need to add different conditions for date or time. Output:
{"Utility":"DBUpdate","StartDate":"2020-09-21_14:41:12","Server":"eaidev","Userid":"sx50067","TrueExit":"No","WaitInterval":30,"Cluster":"1","Source":"MANNING1","Target":"MANNING2","ClusterListCt":5,"ListCt":55,"RequestServer":"MANNING3","Reply":"JOT4","ISC(Source)":0,"EndDate":"2020-09-21_14:41:21","ExitCode":0}
{"Utility":"DBUpdate","StartDate":"2020-09-22_14:41:12","Server":"eaidev","Userid":"sx50067","TrueExit":"No","WaitInterval":30,"Cluster":"1","Source":"MANNING1","Target":"MANNING2","ClusterListCt":5,"ListCt":55,"RequestServer":"MANNING3","Reply":"JOT4","ISC(Source)":0,"EndDate":"2020-09-22_14:41:21","ExitCode":0}
{"Utility":"DBUpdate","StartDate":"2020-09-23_14:41:12","Server":"eaidev","Userid":"sx50067","TrueExit":"No","WaitInterval":30,"Cluster":"1","Source":"MANNING1","Target":"MANNING2","ClusterListCt":5,"ListCt":55,"RequestServer":"MANNING3","Reply":"JOT4","ISC(Source)":0,"EndDate":"2020-09-23_14:41:29","ExitCode":0}
{"Utility":"DBUpdate","StartDate":"2020-09-23_14:42:12","Server":"eaidev","Userid":"sx50067","TrueExit":"No","WaitInterval":30,"Cluster":"1","Source":"MANNING1","Target":"MANNING2","ClusterListCt":5,"ListCt":55,"RequestServer":"MANNING3","Reply":"JOT4","ISC(Source)":0,"EndDate":"2020-09-23_14:43:21","ExitCode":0}
Upvotes: 3