Reputation: 67

How can I find the top three highest rows based on a column in a csv?

I'm trying to iterate through each line of a csv and bring back the top 3 highscores. There's only 2 columns, one named 'users' and the other 'highscores'. I know what I have so far isn't much but I'm completely stumped. I feel like I could get the highest score by storing the value and iterate over each line and then replace it if it's less than a number, but I'm not sure what to do If I want the top three lines.

This is how I began:

import csv
a = open('highscoreslist.csv')
spreadsheet = csv.DictReader(a)
names = []
scores = []

for row in speadsheet:
  names.append(row['users'])
  scores.append(row['highscores'])

And now I just don't know what direction to take. I was going to put them in two lists and then find the highest that way, but they're already in a dictionary so that my be pointless. I'm also trying to learn this concept, so I would prefer not to do it in Pandas.

Upvotes: 2

Answers (6)

Rich Andrews

Reputation: 1670

This answer illustrates a Python solution to your problem and how things roll with Python.

sorted([(row['users'], row['highscores']) for row in csv.DictReader(a)],  key=lambda t: t[1], reverse=True)[:3]

You are asking for the three highest scores, which presumably you wish to also know the user otherwise this becomes trivially easy.

Given your code, the primary issue you are having is that you have placed names and scores in two independent data structures.

So get the data into the same datastructure. For that, you can loop over the dict rows of the CSV from the DictReader. Change those dicts into tuples. And don't use a loop, use a list comprehension.

[(row['users'], row['highscores']) for row in csv.DictReader(a)]

You will get [(1,2),(6,5),(6,7)] etc.

Then use a 'lambda' this is a small function that can be passed into other functions, like sorted().

sorted() works on a collection that is passed in, such as the above list of tuples. And, it takes some very useful arguments, not the least of which is a key to sort by. So, we can get fancy and use the lambda to specify which key we wish sorting to occur by.

sorted([(1,2),(6,5),(6,7)], key= the lambda) and lambda t: t[1] says return the second element of each tuple (the user's high score). So then sorted() sorts by highscore.

And then you slice your result by list slicing, another super cool python thing. [:3] says give me the first three elements - and sorted from high to low gives you the top three scores, and their users, because you sorted(...,reverse=True).

You can than access the resultant list of tuples to show the high scores of a game you are hoping to make/automate!

Upvotes: 1

Jab

Reputation: 27495

This can be done with a collections.Counter very easily as well.

import csv
from collections import Counter

players = Counter()

with open('highscoreslist.csv') as a:
    spreadsheet = csv.DictReader(a)

    for row in speadsheet:
        name, score = row['users'], row['score']
        players[name] = score

top_3 = players.most_common(3)
print(*(f'{name}: {score}' for name, score in top_3), sep='\n'))

This prints the top 3 in format:

name: score
name: score
name: score

Upvotes: 0

Ardweaden

Reputation: 887

If you want to do it on the run, you could simply store the top three highscores and corresponding users in order. On each iteration, check if the new higscore is larger than any of the three you have; if it's larger than the smallest of the three, replace it. If it's larger than the biggest, replace the first one and shift down the other two etc. perhaps this pseudocode makes it more understandable.

score1 = 0
score2 = 0
score3 = 0

#score3 >= score2 >= score 1

for score in file:
    if score > score3:
        if score <= score2:
            score1 = score
        elif score <= score3:
            score1 = score2
            score2 = score
        elif score > score3:
            score1 = score2
            score2 = score3
            score3 = score

Of course you could then keep the users in separate variables and changed them together with the values.

Upvotes: 1

Aidan H

Reputation: 122

I did something similar to @Kristopher Ives, by keeping it as a OrderedDict you can still pull the username of the user.

scores = OrderedDict(sorted(spreadsheet.items(), key=lambda row:row[1]["highscores"]))

Upvotes: 0

Randy

Reputation: 14847

This is a task that is well-suited for pandas, and honestly, it's almost never worth bothering with Python's stock csv library:

In [36]: df = pd.read_csv("highscoreslist.csv")

In [37]: df.sort_values("highscores", ascending=False).head(3)
Out[37]:
  users  highscores
6     G       99032
5     F       89584
1     B       73201

(here's how that frame was created also):

In [32]: import pandas as pd

In [33]: import numpy as np

In [34]: df = pd.DataFrame.from_dict({'users': [chr(ord("A") + i) for i in range(10)],
    ...:                              'highscores': np.random.randint(0, 100000, 10)})

In [35]: df.to_csv("highscoreslist.csv", index=False)

Upvotes: 2

Kristopher Ives

Reputation: 6025

You can sort the spreadsheet by high score:

scores = sorted(scores, key=lambda row: row['highscores'], reverse=True)

Now the top three scores are in scores[0], scores[1] and scores[2]

My answer includes NO PANDAS

Upvotes: 2

How can I find the top three highest rows based on a column in a csv?

Answers (6)

Related Questions