user9525103
user9525103

Reputation:

Get max from dict.values() with the same key

I have this csv.file. Let's say I already used DictReader and now I have some list full of dicts, like ('name': 'Andrew'), ('points': 18) etc.

name    points
Andrew  18
Kate    10
Jack    55
Andrew  31
Andrew  100
Jack    58
Andrew  34
Kate    22
Jack    5
Andrew  72

What I want to do is to return a key-value pair like Andrew: (5, 100), where the value is:

  1. how many times I've met this name in a list;
  2. the max number for this name from the points table.

I have no problem with the first task, but can't find the solution for the 2nd one. That's what I tried to do:

name_counter = defaultdict(int)
max_points = defaultdict(int)
for dictionary in list_from_csv:
    name_counter[dictionary['name']] += 1 #every time I meet the name, I add +1 to the value
    max_points[dictionary['name']] = ??? 

I was thinking just use max(dictionary[points]), but max should pick from the number of numbers, not just one. Maybe create a list, but not sure how. Any other ideas?

Would appreciate any help.

P.S. And after I have these 2 dicts, I will need to merge them, based on the key, but I hope it is not that hard.

Upvotes: 0

Views: 103

Answers (4)

Gergely M
Gergely M

Reputation: 733

Here's a solution without using any extra import other than csv.

I've used your sample data as a csv file. I've read the content and created a list of tuples of (name, points)

import csv
list_of_tuples = []

with open('f1.csv', newline='') as csv_file:
    dict_of_csv = csv.DictReader(csv_file)
    for item in dict_of_csv:
        list_of_tuples.append((item['name'], item['points']))

The list_of_tuples looks like this

[('Andrew', '18'), ('Kate', '10'), ('Jack', '55'), ('Andrew', '31'), ('Andrew', '100'), ('Jack', '58'), ('Andrew', '34'), ('Kate', '22'), ('Jack', '5'), ('Andrew', '72')]

The result_dict stores data in {key: (tuple_0, tuple_1), } format like

{ name: (name_count, max_points),
  name1: (name_count1, max_points1),
  ...
}

The values in a dictionary are identified with their key, which is the name in this case.
Like dictionary['key'] so here result_dict[name] The data in a tuple can be accessed as a normal list, like tuple[0] and tuple[1]
So here, it is result_dict[name][0] and result_dict[name][1]

result_dict = {}
for dict_item in list_of_tuples:
    name = dict_item[0]
    points = int(dict_item[1])
    if name in result_dict:
        name_count = result_dict[name][0]
        max_points = result_dict[name][1]
        result_dict[name] = (name_count + 1, points if max_points < points else max_points)
    else:
        # the name isn't in the dictionary, so we add the "name: (name_count, max_points)" to it
        result_dict[name] = (1, points)

The output is:

{'Andrew': (5, 100), 'Kate': (2, 22), 'Jack': (3, 58)}

Upvotes: 0

jpp
jpp

Reputation: 164773

For completeness, here's the 3rd party Pandas one-liner:

res = df.groupby('name')['points'].agg(['size', 'max'])

Result

print(res)

        size  max
name             
Andrew     5  100
Jack       3   58
Kate       2   22

Setup

import pandas as pd
from io import StringIO

mystr = StringIO("""name    points
Andrew  18
Kate    10
Jack    55
Andrew  31
Andrew  100
Jack    58
Andrew  34
Kate    22
Jack    5
Andrew  72""")

df = pd.read_csv(mystr, delim_whitespace=True)

Upvotes: 0

abarnert
abarnert

Reputation: 365925

You just need to work out what to do to max_points[name] each time you get a new value, right?

Let's pretend that, at each iteration, max_points[name] has already been correctly set to the highest value that you've seen so far. So, what do you need to do with the new value?

Simple: if points is bigger than the highest value you've seen so far, it's the new highest value; if not, the old highest value is the new highest value.

Which is exactly what max does. So:

max_points[dictionary['name']] = max(max_points[dictionary['name']], points)

Now we just need to verify that assumption was correct.

  • Since you're using defaultdict(int), it always starts at 0. If you can have negative scores, that's already wrong, but otherwise, it's correct—the highest score you've seen so far, for anyone, is 0.

  • At each step, if it was correct at the previous step, it's correct after the next step, because that's what max does.

  • So, by induction, it's correct at the end.


As a side note, instead of repeating dictionary['name'] over and over, it might look nicer like this:

for dictionary in list_from_csv:
    name = dictionary['name']
    name_counter[name] += 1
    max_points[name] = max(max_points[name], points)

Upvotes: 2

Ajax1234
Ajax1234

Reputation: 71461

You can use itertools.groupby:

import itertools
data = [{'name': 'Andrew', 'points': 18}, {'name': 'Kate', 'points': 10}, {'name': 'Jack', 'points': 55}, {'name': 'Andrew', 'points': 31}, {'name': 'Andrew', 'points': 100}, {'name': 'Jack', 'points': 58}, {'name': 'Andrew', 'points': 34}, {'name': 'Kate', 'points': 22}, {'name': 'Jack', 'points': 5}, {'name': 'Andrew', 'points': 72}]
grouped_data = [[a, list(b)] for a, b in itertools.groupby(sorted(data, key=lambda x:x['name']), key=lambda x:x['name'])]
final_data = [{a:(len(b), max(b, key=lambda x:x['points'])['points'])} for a, b in grouped_data]

Output:

[{'Andrew': (5, 100)}, {'Jack': (3, 58)}, {'Kate': (2, 22)}]

Upvotes: 0

Related Questions