jester112358
jester112358

Reputation: 465

Data parsing, the pythonic way

Inside fixtures.txt is the content of Premier League's fixtures for the next season. Data looks like this:

foo@ubuntu:~/Desktop$ less fixtures.txt |head -n 4
8 August 2015
AFC Bournemouth v Aston Villa    #BOUAVL
Arsenal v West Ham United    #ARSWHU
Chelsea v Swansea City    #CHESWA

I'd like to rank the fixtures for each team. My approach looks very bad and includes bunch of lines. What would be the more effective way to do this?

teams = {'BOU' : 4, 'WAT' : 4, 'LEI' : 4, 'NOR' : 4, 'AVL' : 3, 'SUN' : 3, 'NEW' : 3, 'WBA' : 3, 'STK' : 2, 'SWA' : 2, 'EVE': 2, 'SOU' : 2, 'CPL' : 2, 'TOT': 2, 'ARS' : 1, 'CHE' : 1, 'MUN' : 1, 'LIV' : 1, 'MCI' : 1}

fd = open("fixtures.txt", "r")

for lines in fd:
lines = lines.strip()
matches = lines.split("#")
if "CHE" in lines:
    for k,v in teams.items():
        if k in matches[1]:
            if "CHE" not in k:
                print k,v

outputs (Chelsea's first fixtures):

SWA 2
MCI 1
WBA 3
EVE 2
ARS 1
NEW 3
SOU 2
...

Upvotes: 1

Views: 85

Answers (1)

Zinob
Zinob

Reputation: 103

What is best depends on how much data you have to process. Half the problem in this case is that the data retrieval and the printing are all jumbled up. Unless you are working with really large amounts of data it is advisable to split them up.

If the data amount is small (less than a few hundred matches) you can read all the data in to a list like so:

teams = {'BOU' : 4, 'WAT' : 4, 'LEI' : 4, 'NOR' : 4, 'AVL' : 3, 'SUN' : 3, 'NEW' : 3, 'WBA' : 3, 'STK' : 2, 'SWA' : 2, 'EVE': 2, 'SOU' : 2, 'CPL' : 2, 'TOT': 2, 'ARS' : 1, 'CHE' : 1, 'MUN' : 1, 'LIV' : 1, 'MCI' : 1}

def read_fix(filename):
    """Reads a named fixtures-file and returns a list containing pairs of team names [["BOU","AVL"],["ARS","WHU"],...]"""
    matches=[] #create an empty list, this is part of a so called accumulator pattern
    with open(filename, "r") as fd:   #The with statement guarantees that the opened file is closed when the block ends.
        for line in fd:
            line = line.strip().split("#")  #You can chain multiple method-calls on one line, this does both white-space-stripping and splitting on #.
            if len(line)==2:   #Filter out only lines that contain a game (or well exactly one #, possibly dangerous)
                teams=line[1]  #Remember that python indexes lists and strings from 0
                matches.append([teams[0:3],teams[3:6]])  #split the part after the # into two parts, 3 letters each and add the pair to the list (this is the accumulation step of the accumulator-pattern)
    return matches

And then use another function to print it:

def print_fix(games,team):
    """Takes a list of games (as team-pairs) and prints information for the opposing team in each match that contains the specified team."""
    team=team.upper()  #just for convenience
    for game in games:
        if team in game:  #The in statement returns True if the team element is equal to at least one of the teams in the game-array.
            #For example: team in ["CHE","LIE"] would return true if team was either "CHE" or "LIE"
            if game[0] == team: #If "my team" is the first one of the pair, then the "Other team" must be the second, and the other way around.
                other=game[1]
            else:
                other=game[0]
            print other, teams[other]

matches= read_fix("fixtures.txt")
print_fix(matches,"CHE") 

A much more efficient way is of cause to use a dict for the temporary storage, but I think this code might be slightly easier to read.

Upvotes: 2

Related Questions