Vincent Luc
Vincent Luc

Reputation: 79

Read a CSV file and create a dictionary?

let's say I have a 'players.csv' file below with data of some NFL players. My goal is to read the file, and create a dictionary with the keys as the height of the players, and the values as lists of player profiles. (Which are in a tuple)

HEIGHT,NAME,DRAFTED,AGE,POSITION,WEIGHT

6,Aaron,2005,31,QB,225

5,Jordy,2008,30,WR,217

5,Randall,2011,24,WR,192

Player profile tuple example, the 'name' must be a string and 'age' and 'position' must be integers. The 'year' drafted and 'position' must be ignored.

player_profile = (name, age, position)

Expected dictionary:

# players height are keys, player profiles are values.
dict = {
    6: [('Aaron', 31, 225)]
    5: [('Jordy', 30, 217), ('Randall', 24, 192)]
   }

Below is what I have so far and I'm stuck.

final_dict = {}

#open csv file
with open(filename) as f:
    info = f.read()

#split the newline characters
info2 = info.split()

#exclude the header
info3 = info2[1:]

Upvotes: 3

Views: 581

Answers (3)

Padraic Cunningham
Padraic Cunningham

Reputation: 180481

Use the csv module with a defaultdict to handle repeating keys:

import csv
from collections import defaultdict

d = defaultdict(list)

with open("in.csv") as f:
    next(f) # skip header
    r = csv.reader(f)
    # unpack use height as key and  append name age and position
    for h, nm, _, a, p ,_ in r:
        d[int(h)].append((nm, int(a), p))

print(d)

Output:

defaultdict(<type 'list'>, {5: [('Jordy', 30, 'WR'), ('Randall', 24, 'WR')], 6: [('Aaron', 31, 'QB')]})

If you really want to avoid imports you can str.split and use dict.setdefault but I see no reason not to use builtin libraries like csv and collections:

d = {}

with open("in.csv") as f:
    next(f)  
    for line in f:
        h, nm, _, a, p ,_  = line.split(",")
        d.setdefault(int(h),[]).append((nm, int(a), p))

print(d)

Output:

{5: [('Jordy', 30, 'WR'), ('Randall', 24, 'WR')], 6: [('Aaron', 31, 'QB')]}

Your input example is incorrect as POSITION is a string, you should be taking WEIGHT to match your expected output :

with open("in.csv") as f:
    next(f) # skip header
    r = csv.reader(f)
    # unpack use height as key and  append name age and weight
    for h, nm, _, a, _ ,w in r:
        d[int(h)].append((nm, int(a), int(w)))

Output:

defaultdict(<type 'list'>, {5: [('Jordy', 30, 217), ('Randall', 24, 192)], 6: [('Aaron', 31, 225)]})

Make the same changes using the normal dict to get the same output.

Upvotes: 2

golobitch
golobitch

Reputation: 1334

I think this is the most basic solution to this question

from collections import defaultdict

players = defaultdict(list)
for line in open("players.csv"):
    line = line.strip()
    tokens = line.split(",")
    xs = [tokens[1], tokens[3], tokens[5]]
    players[tokens[0]].append(tuple(xs))

First of all you are defining default dict with list as value. Then you go through file and we must strip some special characters like "\n" and so one. Then we split whole line by ",". Then we know where is what. We know that number is on zero position, so that is our key. Other atributes are on 1st, 3rd and 5th position, so we also include those tokens in our list. We are including this tokens to list just to convert this list to tuple. It is the easiest solution. We could also said something like this

players[tokens[0]].append((tokens[1], tokens[3], tokens[5]))

It would also work :)

Regards, golobich

Upvotes: 0

A_A
A_A

Reputation: 2368

The problem with the csv module is that it doesn't automatically handle data type conversion and as you probably noticed already from Padraic's answer, the keys are strings and so is the age. This in turn means that you will need an additional pass, possibly with a map, in which you will be casting the strings to their right types. Furthermore, it is likely that once you read your file, you will want to perform some sort of analysis or other processing to its contents.

For this reason, I would like to suggest working with a pandas.DataFrame that offers a behaviour similar to that of a dictionary as follows:

import pandas
Q = pandas.read_csv("myfile.csv", index_col = "HEIGHT")

Q is now a DataFrame. To retrieve all players with a height of 5:

Q.ix[5] #Returns two rows according to the data posted in the question.

To get the median age of players of height 5:

Q.ix[5]["AGE"].median() #27.0 according to the data posted in the question.

For more information on pandas please see this link.

Hope this helps.

Upvotes: 0

Related Questions