Reputation: 1811
I have an excel sheet with dates, time, and temp that look like this:
using python, I want to extract this info into python arrays.
The array would get the date in position 0, and then store the temps in the following positions and look like this:
temparray[0] = [20130102,34.75,34.66,34.6,34.6,....,34.86]
temparray[1] = [20130103,34.65,34.65,34.73,34.81,....,34.64]
here is my attempt, but it sucks:
from xlrd import *
print open_workbook('temp.xlsx')
wb = open_workbook('temp.xlsx')
for s in wb.sheets():
for row in range(s.nrows):
values = []
for col in range(s.ncols):
values.append(s.cell(row,col).value)
print(values[0])
print("%.2f" % values[1])
print'''
i used xlrd, but I am open to using anything. Thank you for your help.
Upvotes: 1
Views: 8832
Reputation: 736
If you are looking to keep all the data for the same dates, I might suggest using a dictionary to get a list of the temps for particular dates. Then once you get the dict initialized with your data, you can rearrange how you like. Try something like this after wb=open_workbook('temp.xlsx')
:
tmpDict = {}
for s in wb.sheets():
for row in xrange(s.nrows):
try:
tmpDict[s.cell(row, 0)].append(s.cell(row, 2).value)
except KeyError:
tmpDict[s.cell(row, 0)] = [s.cell(row,2).value]
If you print tmpDict, you should get an output like:
{date1: [temp1, temp2, temp3, ...],
date2: [temp1, temp2, temp3, ...]
...}
Dictionary keys are kept in an arbitrary order (it has to do with the hash value of the key) but you can construct a list of lists based on the content of the dict like so:
tmpList = []
for key in sorted(tmpDict.keys):
valList = [key]
valList.extend(tmpDict[key])
tmpList.append(valList)
Then, you'll get a list of lists ordered by date with the vals, as you were originally working. However, you can always get to the values in the dictionary by using the keys. I typically find it easier to work with the data afterwards but you can change it to any form you need.
Upvotes: 1
Reputation: 365647
From what I understand of your question, the problem is that you want the output to be a list of lists, and you're not getting such a thing.
And that's because there's nothing in your code that even tries to get such a thing. For each row, you build a list, print out the first value of that list, print out the second value of that list, and then forget the list.
To append each of those row lists to a big list of lists, all you have to do is exactly the same thing you're doing to append each column value to the row lists:
temparray = []
for row in range(s.nrows):
values = []
for col in range(s.ncols):
values.append(s.cell(row,col).value)
temparray.append(values)
From your comment, it looks like what you actually want is not only this, but also grouping the temperatures together by day, and also only adding the second column, rather than all of the values, for each day. Which is not at all what you described in the question. In that case, you shouldn't be looping over the columns at all. What you want is something like this:
days = []
current_day, current_date = [], None
for row in range(s.nrows):
date = s.cell(row, 0)
if date != current_date:
current_day, current_date = [], date
days.append(current_day)
current_day.append(s.cell(row, 2))
This code assumes that the dates are always in sorted order, as they are in your input screenshot.
I would probably structure this differently, building a row iterator to pass to itertools.groupby
, but I wanted to keep this as novice-friendly, and as close to your original code, as possible.
Also, I suspect you really don't want this:
[[date1, temp1a, temp1b, temp1c],
[date2, temp2a, temp2b]]
… but rather something like this:
{date1: [temp1a, temp1b, temp1c],
date2: [temp1a, temp1b, temp1c]}
But without knowing what you're intending to do with this info, I can't tell you how best to store it.
Upvotes: 3