Reputation: 29
I am loading data from excel to populate objects ie teacher(lastName, firstName, user_ID). These teacher objects are then put into a list, where further manipulation and analysis is done (ie looking for matches and other patterns). My approach to do this was use the xlrd module to load the data from excel.
for row in range(11) :
lastName = sheet1.cell_value(row,3)
firstName = sheet1.cell_value(row,1)
userID = sheet1.cell_value(row,0)
profList.append(tempProf)
So this method takes my computer like 20 seconds to do for just 12 rows and I need to do this for 10,000. My question- is there another way to load in data like this in a more efficient manner? I am open to out scoping, so if this question sounds stupid, I would love to hear alternatives, thanks
Upvotes: 0
Views: 770
Reputation: 142651
You could use pandas
to load data directly from excel to pandas
DataFrame
import pandas as pd
df = pd.read_excel('filename')
and then you can use pandas
for fast manipulation and analysis.
Lastly you can save it in Excel, CSV, JSON, HTML, LaTeX or maybe even directly in database.
So maybe you would not need some "teacher object".
pandas - Python Data Analysis Library
Pandas Cookbook
Pandas Lessons
Upvotes: 1
Reputation: 164
XLRD provides a row(row#) method to, which should save you 2 trips back to the file.
for row in range(11) :
userID, firstName, junk, lastName = sheet1.row_values(row, 0, 3)
documentation here: https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966#sheet.Sheet.row_values-method
You want to work with rows over cells in most cases, taking a slice out of the data set and working with it is more efficient then going back to the data set each time.
10000 rows is quite a bit, so loading it in will take a while. You may want to consider a format like XML, or JSON which usually have efficient search methods in their libraries, or a flat DB engine like SQLite.
Once you hit more then a thousand or so rows its usually a good idea to move to a programmer friendly format or a database, since those are geared towards efficiency.
*I'm guessing from the code above that there are 4 columns, the third one is ignored, hence the junk variable.
Upvotes: 0
Reputation: 19030
An alternative is to convert your Excel spreadsheets to CSV.
Then use the Python csv module from the standard library.
from csv import reader as csv_reader
with open("data.csv", "r") as fd:
for row in csv_reader(fd):
userid, firstname, lastname = row
Explanation:
csv.reader
object as csv_reader
(my personal convention).data.csv
for reading "r"
placing the resulting file object into fd
.csv_reader(file_object)
gives us.userid, firstname, lastname
and so we can unpack the list row
into referneces/variables. See: Tuples and SequencesAlso Note: If your data contains Unicode charaacters you will want to install and use the unicodecsv 3rd-party library from the Python Package Index (PyPi).
Upvotes: 5