Reputation: 1449
My function returns dictionary from a list. There are cases when one JobName
has multiple entries due to StartedOn
field. I've managed to get it sorted by date but now I want to return only top 1 data
whenever JobName
key changes as below:
Current output:
{'JobName': 'job_1', 'StartedOn': datetime.datetime(2022, 7, 19, 18, 17, 10, 832000, tzinfo=tzlocal()), 'JobRunState': 'FAILED'}
{'JobName': 'job_1', 'StartedOn': datetime.datetime(2022, 7, 19, 18, 12, 46, 547000, tzinfo=tzlocal()), 'JobRunState': 'FAILED'}
{'JobName': 'job_2', 'StartedOn': datetime.datetime(2022, 7, 4, 16, 39, 19, 895000, tzinfo=tzlocal()), 'JobRunState': 'SUCCEEDED'}
{'JobName': 'job_2', 'StartedOn': datetime.datetime(2022, 7, 4, 16, 20, 29, 357000, tzinfo=tzlocal()), 'JobRunState': 'FAILED'}
{'JobName': 'job_2', 'StartedOn': datetime.datetime(2022, 7, 4, 15, 57, 31, 513000, tzinfo=tzlocal()), 'JobRunState': 'FAILED'}
Expected output:
{'JobName': 'job_1', 'StartedOn': datetime.datetime(2022, 7, 19, 18, 17, 10, 832000, tzinfo=tzlocal()), 'JobRunState': 'FAILED'}
{'JobName': 'job_2', 'StartedOn': datetime.datetime(2022, 7, 4, 16, 39, 19, 895000, tzinfo=tzlocal()), 'JobRunState': 'SUCCEEDED'}
Code:
j = ["job_1", "job_2"]
def job_status(job_name):
paginator = glue_client.get_paginator('get_job_runs')
response = paginator.paginate(JobName=job_name)
return response
def filtered_data(j):
final_list = []
jobs = [glue_client.job_status(e) for e in j]
for e in jobs:
for page in e:
final_list.append(page["JobRuns"])
flat_list = [item for sublist in final_list for item in sublist]
sorted_list = sorted(flat_list, key=lambda k: (k['JobName'], k['StartedOn']), reverse=True)
Upvotes: 0
Views: 78
Reputation: 781769
Create a dictionary keyed by JobName
. Loop through the list of dictionaries, replacing the dictionary element if the StartedOn
time is higher than the current value.
So change:
flat_list = [item for sublist in final_list for item in sublist]
to
data_dict = {}
for sublist in final_list:
for item in sublist:
jobname = item['JobName']
if jobname not in data_dict or item['StartedOn'] > data_dict[jobname]['StartedOn']:
data_dict[jobname] = item
flat_list = list(data_dict.values())
Upvotes: 2