Reputation: 79
let's say I have a 'players.csv' file below with data of some NFL players. My goal is to read the file, and create a dictionary with the keys as the height of the players, and the values as lists of player profiles. (Which are in a tuple)
HEIGHT,NAME,DRAFTED,AGE,POSITION,WEIGHT
6,Aaron,2005,31,QB,225
5,Jordy,2008,30,WR,217
5,Randall,2011,24,WR,192
Player profile tuple example, the 'name' must be a string and 'age' and 'position' must be integers. The 'year' drafted and 'position' must be ignored.
player_profile = (name, age, position)
Expected dictionary:
# players height are keys, player profiles are values.
dict = {
6: [('Aaron', 31, 225)]
5: [('Jordy', 30, 217), ('Randall', 24, 192)]
}
Below is what I have so far and I'm stuck.
final_dict = {}
#open csv file
with open(filename) as f:
info = f.read()
#split the newline characters
info2 = info.split()
#exclude the header
info3 = info2[1:]
Upvotes: 3
Views: 581
Reputation: 180481
Use the csv module with a defaultdict to handle repeating keys:
import csv
from collections import defaultdict
d = defaultdict(list)
with open("in.csv") as f:
next(f) # skip header
r = csv.reader(f)
# unpack use height as key and append name age and position
for h, nm, _, a, p ,_ in r:
d[int(h)].append((nm, int(a), p))
print(d)
Output:
defaultdict(<type 'list'>, {5: [('Jordy', 30, 'WR'), ('Randall', 24, 'WR')], 6: [('Aaron', 31, 'QB')]})
If you really want to avoid imports you can str.split and use dict.setdefault but I see no reason not to use builtin libraries like csv and collections:
d = {}
with open("in.csv") as f:
next(f)
for line in f:
h, nm, _, a, p ,_ = line.split(",")
d.setdefault(int(h),[]).append((nm, int(a), p))
print(d)
Output:
{5: [('Jordy', 30, 'WR'), ('Randall', 24, 'WR')], 6: [('Aaron', 31, 'QB')]}
Your input example is incorrect as POSITION
is a string, you should be taking WEIGHT
to match your expected output :
with open("in.csv") as f:
next(f) # skip header
r = csv.reader(f)
# unpack use height as key and append name age and weight
for h, nm, _, a, _ ,w in r:
d[int(h)].append((nm, int(a), int(w)))
Output:
defaultdict(<type 'list'>, {5: [('Jordy', 30, 217), ('Randall', 24, 192)], 6: [('Aaron', 31, 225)]})
Make the same changes using the normal dict to get the same output.
Upvotes: 2
Reputation: 1334
I think this is the most basic solution to this question
from collections import defaultdict
players = defaultdict(list)
for line in open("players.csv"):
line = line.strip()
tokens = line.split(",")
xs = [tokens[1], tokens[3], tokens[5]]
players[tokens[0]].append(tuple(xs))
First of all you are defining default dict with list as value. Then you go through file and we must strip some special characters like "\n" and so one. Then we split whole line by ",". Then we know where is what. We know that number is on zero position, so that is our key. Other atributes are on 1st, 3rd and 5th position, so we also include those tokens in our list. We are including this tokens to list just to convert this list to tuple. It is the easiest solution. We could also said something like this
players[tokens[0]].append((tokens[1], tokens[3], tokens[5]))
It would also work :)
Regards, golobich
Upvotes: 0
Reputation: 2368
The problem with the csv
module is that it doesn't automatically handle data type conversion and as you probably noticed already from Padraic's answer, the keys are strings and so is the age. This in turn means that you will need an additional pass, possibly with a map
, in which you will be casting the strings to their right types. Furthermore, it is likely that once you read your file, you will want to perform some sort of analysis or other processing to its contents.
For this reason, I would like to suggest working with a pandas.DataFrame
that offers a behaviour similar to that of a dictionary as follows:
import pandas
Q = pandas.read_csv("myfile.csv", index_col = "HEIGHT")
Q
is now a DataFrame. To retrieve all players with a height of 5:
Q.ix[5] #Returns two rows according to the data posted in the question.
To get the median age of players of height 5:
Q.ix[5]["AGE"].median() #27.0 according to the data posted in the question.
For more information on pandas please see this link.
Hope this helps.
Upvotes: 0