marcin2x4
marcin2x4

Reputation: 1449

Python - return top value for each dict key

My function returns dictionary from a list. There are cases when one JobName has multiple entries due to StartedOn field. I've managed to get it sorted by date but now I want to return only top 1 data whenever JobName key changes as below:

Current output:

{'JobName': 'job_1', 'StartedOn': datetime.datetime(2022, 7, 19, 18, 17, 10, 832000, tzinfo=tzlocal()), 'JobRunState': 'FAILED'}
{'JobName': 'job_1', 'StartedOn': datetime.datetime(2022, 7, 19, 18, 12, 46, 547000, tzinfo=tzlocal()), 'JobRunState': 'FAILED'}
{'JobName': 'job_2', 'StartedOn': datetime.datetime(2022, 7, 4, 16, 39, 19, 895000, tzinfo=tzlocal()), 'JobRunState': 'SUCCEEDED'}
{'JobName': 'job_2', 'StartedOn': datetime.datetime(2022, 7, 4, 16, 20, 29, 357000, tzinfo=tzlocal()), 'JobRunState': 'FAILED'}
{'JobName': 'job_2', 'StartedOn': datetime.datetime(2022, 7, 4, 15, 57, 31, 513000, tzinfo=tzlocal()), 'JobRunState': 'FAILED'}

Expected output:

{'JobName': 'job_1', 'StartedOn': datetime.datetime(2022, 7, 19, 18, 17, 10, 832000, tzinfo=tzlocal()), 'JobRunState': 'FAILED'}
{'JobName': 'job_2', 'StartedOn': datetime.datetime(2022, 7, 4, 16, 39, 19, 895000, tzinfo=tzlocal()), 'JobRunState': 'SUCCEEDED'}

Code:

j = ["job_1", "job_2"]

def job_status(job_name):
    paginator = glue_client.get_paginator('get_job_runs')
    response = paginator.paginate(JobName=job_name)
    return response


def filtered_data(j):
    final_list = []

    jobs = [glue_client.job_status(e) for e in j]

    for e in jobs:
        for page in e:
            final_list.append(page["JobRuns"])
            
    flat_list = [item for sublist in final_list for item in sublist]
    sorted_list = sorted(flat_list, key=lambda k: (k['JobName'], k['StartedOn']), reverse=True)

Upvotes: 0

Views: 78

Answers (1)

Barmar
Barmar

Reputation: 781769

Create a dictionary keyed by JobName. Loop through the list of dictionaries, replacing the dictionary element if the StartedOn time is higher than the current value.

So change:

flat_list = [item for sublist in final_list for item in sublist]

to

data_dict = {}

for sublist in final_list:
    for item in sublist:
        jobname = item['JobName']
        if jobname not in data_dict or item['StartedOn'] > data_dict[jobname]['StartedOn']:
            data_dict[jobname] = item
flat_list = list(data_dict.values())

Upvotes: 2

Related Questions