mjolnir
mjolnir

Reputation: 61

Python: Import from text file to list and sort/average based on multiple columns

I have a text file that looks like this:

Mike 5 7 9
Terry 3 7 4
Ste 8 2 3

I wrote the following program to

def alphabetical():
    scoreslist = []
    with open ("classa.txt") as f:
        content = f.read().splitlines()
        for line in content:
            splitline = line.split(" ")
            name = splitline[0]
            score = splitline[1:]
            highscore = sorted(score)[-1]
            scoreslist.append("{} {}".format(name,highscore))

    scoreslist.sort(key=lambda x: x[0])
    print(scoreslist)

The final output looks like this:

Mike 9
Ste 8
Terry 7

I'm happy with the function at the moment but I feel that it could be a little more concise. Is there a simpler way?

More importantly, I want to take the original file and use the same method to create an average of the numbers in the original text file and output in the same format. I thought there might be a simple average function that I could use but this obviously isn't happening:

score = splitline.avg[-1:-3]

Upvotes: 1

Views: 1642

Answers (3)

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

You can use statistics.mean to calculate your averages, the csv lib to parse your file into rows, you never need to call read unless you actually want a single string of all the file content, you can iterate over a file object and split each line.

from statistics import mean
import csv

def sort_mean(fle):
    with open(fle) as f:
       for name, *scores in csv.reader(f, delimiter=" "):
            srt = sorted(map(int, scores))
            print("Highest score for {} is  {}".format(name, srt[-1]))
            print("Average score for {} is {}".format(name, mean(srt)))

For your input file it would ouput:

Highest score for Mike is  9
Average score for Mike is 7.0
Highest score for Terry is  7
Average score for Terry is 4.666666666666667
Highest score for Ste is  8
Average score for Ste is 4.333333333333333

Now if you want to store all that data and output it ordered:

from statistics import mean
import csv
from operator import itemgetter


def sort_mean(fle):
    avgs, high = [], []
    with open(fle) as f:
        for name, *scores in csv.reader(f, delimiter=" "):
            srt = list(map(int, scores))
            avgs.append((name, mean(srt)))
            high.append((name, max(srt)))
    avgs.sort(key=itemgetter(1), reverse=1)
    high.sort(key=itemgetter(1), reverse=1)
    return avgs, high

That will give you two lists sorted from highest to lowest:

In [10]: high, avgs = sort_mean("in.txt")

In [11]: high
Out[11]: [('Mike', 7.0), ('Terry', 4.666666666666667), ('Ste', 4.333333333333333)]

In [12]: avgs
Out[12]: [('Mike', 9), ('Ste', 8), ('Terry', 7)]

For python2 you will need to calculate the average yourself and the logic for the loop is a little different:

def sort_mean(fle):
    avgs, high = [], []
    with open(fle) as f:
        for row in csv.reader(f, delimiter=" "):
            name, scores = row[0], row[1:]
            srt = map(int, scores)
            avgs.append((name, sum(srt,0.0) / len(srt)))
            high.append((name, max(srt)))
    avgs.sort(key=itemgetter(1), reverse=1)
    high.sort(key=itemgetter(1), reverse=1)
    return avgs, high

Instead of two lists you could store a dict of dicts that had the users highest score and mean and sort the items store in that.

In regard to your own function, you could rewrite it as the following:

def alphabetical():
    scoreslist = []
    with open ("classa.txt") as f:
        # just iterate over the file object
        # line by line
        for line in f:
            # don't need to pass a delimiter
            split_line = line.split()
            name = split_line[0]
            score = split_line[1:]
            # use max to get the highscore and use int as the key
            # or "123" < "2"
            high_score =  max(score,key=int)
            scores_list.append("{} {}".format(name,high_score))
    # don't need lambda to sort alphabetically
    scores_list.sort()
    print(scores_list)

Upvotes: 3

mjolnir
mjolnir

Reputation: 61

Ok, I gave it a little thought and this seems to work fine. As with all my code it isn't pretty.

scoreslist = []
with open (classchoice) as f:
    content = f.read().splitlines()
    for line in content:
        splitline = line.split(" ") #splits each line by Space
        name = splitline[0]
        total = int(splitline[-1]) + int(splitline[-2]) + int(splitline[-3]) #I created a total by adding the last three values in the text file
        average = (total/3) #then divided them by 3
        scoreslist.append("{} {}".format(name,average)) #changed the output to feature average instead of high score
scoreslist.sort(key=lambda x: x[0])
print(scoreslist)

It seems to work but I assumed there would be a function such as min, max, mean, average that could just be plugged in.

I am, very much, a beginner at this and I must admit that pandas isn't something i've used (or seen) before but thank you for the assistance with it paljenczy.

Upvotes: 0

paljenczy
paljenczy

Reputation: 4899

For your averaging problem, either use sum(x) / len(x) to calculate it manually, or the statistic module contains a mean function as suggested in another answer.

In general, for problems like yours, use the pandas module for data analysis. Note that this is an external package that has to be installed before imported. For tutorials, see here.

import pandas as pd

df = pd.read_table("classa.txt", sep=" ", header=None,
                   names = ["name", "score1", "score2", "score3"])

df["max_score"] = df[["score1", "score2", "score3"]].max(axis = 1)

df_sorted = df[["name", "max_score"]].sort_values(by = "max_score",
                                                  ascending = False)


>>> df_sorted 
    name  max_score
0   Mike          9
2    Ste          8
1  Terry          7

Check the .mean() method of pandas DataFrame objects for taking averages. For writing the resulting DataFrame, check the .to_csv method.

Upvotes: 2

Related Questions