Reputation:
I have this csv.file. Let's say I already used DictReader
and now I have some list full of dicts, like ('name': 'Andrew'), ('points': 18)
etc.
name points
Andrew 18
Kate 10
Jack 55
Andrew 31
Andrew 100
Jack 58
Andrew 34
Kate 22
Jack 5
Andrew 72
What I want to do is to return a key-value pair like Andrew: (5, 100)
, where the value is:
I have no problem with the first task, but can't find the solution for the 2nd one. That's what I tried to do:
name_counter = defaultdict(int)
max_points = defaultdict(int)
for dictionary in list_from_csv:
name_counter[dictionary['name']] += 1 #every time I meet the name, I add +1 to the value
max_points[dictionary['name']] = ???
I was thinking just use max(dictionary[points])
, but max should pick from the number of numbers, not just one. Maybe create a list, but not sure how. Any other ideas?
Would appreciate any help.
P.S. And after I have these 2 dicts, I will need to merge them, based on the key, but I hope it is not that hard.
Upvotes: 0
Views: 103
Reputation: 733
Here's a solution without using any extra import other than csv.
I've used your sample data as a csv file. I've read the content and created a list of tuples of (name, points)
import csv
list_of_tuples = []
with open('f1.csv', newline='') as csv_file:
dict_of_csv = csv.DictReader(csv_file)
for item in dict_of_csv:
list_of_tuples.append((item['name'], item['points']))
The list_of_tuples looks like this
[('Andrew', '18'), ('Kate', '10'), ('Jack', '55'), ('Andrew', '31'), ('Andrew', '100'), ('Jack', '58'), ('Andrew', '34'), ('Kate', '22'), ('Jack', '5'), ('Andrew', '72')]
The result_dict stores data in {key: (tuple_0, tuple_1), } format like
{ name: (name_count, max_points),
name1: (name_count1, max_points1),
...
}
The values in a dictionary are identified with their key
, which is the name
in this case.
Like dictionary['key']
so here result_dict[name]
The data in a tuple can be accessed as a normal list, like tuple[0] and tuple[1]
So here, it is result_dict[name][0]
and result_dict[name][1]
result_dict = {}
for dict_item in list_of_tuples:
name = dict_item[0]
points = int(dict_item[1])
if name in result_dict:
name_count = result_dict[name][0]
max_points = result_dict[name][1]
result_dict[name] = (name_count + 1, points if max_points < points else max_points)
else:
# the name isn't in the dictionary, so we add the "name: (name_count, max_points)" to it
result_dict[name] = (1, points)
The output is:
{'Andrew': (5, 100), 'Kate': (2, 22), 'Jack': (3, 58)}
Upvotes: 0
Reputation: 164773
For completeness, here's the 3rd party Pandas one-liner:
res = df.groupby('name')['points'].agg(['size', 'max'])
Result
print(res)
size max
name
Andrew 5 100
Jack 3 58
Kate 2 22
Setup
import pandas as pd
from io import StringIO
mystr = StringIO("""name points
Andrew 18
Kate 10
Jack 55
Andrew 31
Andrew 100
Jack 58
Andrew 34
Kate 22
Jack 5
Andrew 72""")
df = pd.read_csv(mystr, delim_whitespace=True)
Upvotes: 0
Reputation: 365925
You just need to work out what to do to max_points[name]
each time you get a new value, right?
Let's pretend that, at each iteration, max_points[name]
has already been correctly set to the highest value that you've seen so far. So, what do you need to do with the new value?
Simple: if points
is bigger than the highest value you've seen so far, it's the new highest value; if not, the old highest value is the new highest value.
Which is exactly what max
does. So:
max_points[dictionary['name']] = max(max_points[dictionary['name']], points)
Now we just need to verify that assumption was correct.
Since you're using defaultdict(int)
, it always starts at 0. If you can have negative scores, that's already wrong, but otherwise, it's correct—the highest score you've seen so far, for anyone, is 0.
At each step, if it was correct at the previous step, it's correct after the next step, because that's what max
does.
So, by induction, it's correct at the end.
As a side note, instead of repeating dictionary['name']
over and over, it might look nicer like this:
for dictionary in list_from_csv:
name = dictionary['name']
name_counter[name] += 1
max_points[name] = max(max_points[name], points)
Upvotes: 2
Reputation: 71461
You can use itertools.groupby
:
import itertools
data = [{'name': 'Andrew', 'points': 18}, {'name': 'Kate', 'points': 10}, {'name': 'Jack', 'points': 55}, {'name': 'Andrew', 'points': 31}, {'name': 'Andrew', 'points': 100}, {'name': 'Jack', 'points': 58}, {'name': 'Andrew', 'points': 34}, {'name': 'Kate', 'points': 22}, {'name': 'Jack', 'points': 5}, {'name': 'Andrew', 'points': 72}]
grouped_data = [[a, list(b)] for a, b in itertools.groupby(sorted(data, key=lambda x:x['name']), key=lambda x:x['name'])]
final_data = [{a:(len(b), max(b, key=lambda x:x['points'])['points'])} for a, b in grouped_data]
Output:
[{'Andrew': (5, 100)}, {'Jack': (3, 58)}, {'Kate': (2, 22)}]
Upvotes: 0