Reputation: 2956
I am converting to python and numpy from IDL (kinda like Matlab). This is kinda an open question about handling data. Maybe someone can help.
The usual situation with my data is that I have a fixed class of data, perhaps from a spreadsheet, database etc. I am trying to figure out what kind of data structures are best to use in python and numpy.
I know about the csv module and can use csv.DictReader() to read a spreadsheet. This reads line by line and makes a dictionary with the proper names from the spreadsheet header (first line).
f=open(file,'rU')
dat = csv.DictReader(f)
i=0
data=[] # makes an empty list
i=0
for row in dat:
data.append(row)
if i == 0 :
keys=row.keys()
print "keys"
print keys
print
i=i+1
f.close()
First of all, that is kinda a lot of code to read a csv file into a list of dictionaries and key the keys. Is there a faster/better way?
But now, I wonder whether an array of dictionaries is really what I want. Should I make a class of objects and make this an array of objects? Or something else?
If I have my array of dictionaries, "data", I would get some "column" like age=array([dat["age"] for dat in data])
Is that the right way to do it? Is there no way like "age=data->age" that would do it faster?
Would appreciate some guidance. Thanks.
Upvotes: 2
Views: 1531
Reputation: 284672
Seeing as how you explicitly mention using numpy, consider something like the following:
import numpy as np
data = np.genfromtxt('data.txt', delimiter=',', names=True)
print data['item1']
Or
import numpy as np
item1, item2, item3 = np.loadtxt('data.txt', delimiter=',', skiprows=1).T
Where the format of data.txt
is something along these lines (i.e. comma delimited).
item1, item2, item3
1.0, 2.0, 3.0
4.0, 5.0, 6.0
7.0, 8.0, 9.0
The first example uses structured arrays, while the second is just unpacking the columns (thus the transpose (.T
)) into three variables.
Upvotes: 2
Reputation: 40340
If you're working with spreadsheet-type data a lot, I'd strongly recommend using pandas, a Python package designed for this sort of thing. You just do:
pandas.read_csv(file)
That gives you a DataFrame
, which does all sorts of fancy indexing, and is nice and fast.
Upvotes: 5
Reputation: 249303
Doing it the way you are is OK, though your code can easily be made more concise:
data = list(csv.DictReader(open(file, 'rU')))
print "keys", data[0].keys()
Upvotes: 0