extract information from excel into python 2d array

Question

I have an excel sheet with dates, time, and temp that look like this:

enter image description here

using python, I want to extract this info into python arrays.

The array would get the date in position 0, and then store the temps in the following positions and look like this:

temparray[0] = [20130102,34.75,34.66,34.6,34.6,....,34.86]
temparray[1] = [20130103,34.65,34.65,34.73,34.81,....,34.64]

here is my attempt, but it sucks:

from xlrd import * 

print open_workbook('temp.xlsx')

wb = open_workbook('temp.xlsx')

for s in wb.sheets():
    for row in range(s.nrows):
        values = []
        for col in range(s.ncols):
            values.append(s.cell(row,col).value)
        print(values[0])
        print("%.2f" % values[1])
        print'''

i used xlrd, but I am open to using anything. Thank you for your help.

abarnert · Accepted Answer

From what I understand of your question, the problem is that you want the output to be a list of lists, and you're not getting such a thing.

And that's because there's nothing in your code that even tries to get such a thing. For each row, you build a list, print out the first value of that list, print out the second value of that list, and then forget the list.

To append each of those row lists to a big list of lists, all you have to do is exactly the same thing you're doing to append each column value to the row lists:

temparray = []
for row in range(s.nrows):
    values = []
    for col in range(s.ncols):
        values.append(s.cell(row,col).value)
    temparray.append(values)

From your comment, it looks like what you actually want is not only this, but also grouping the temperatures together by day, and also only adding the second column, rather than all of the values, for each day. Which is not at all what you described in the question. In that case, you shouldn't be looping over the columns at all. What you want is something like this:

days = []
current_day, current_date = [], None
for row in range(s.nrows):
    date = s.cell(row, 0)
    if date != current_date:
        current_day, current_date = [], date
        days.append(current_day)
    current_day.append(s.cell(row, 2))

This code assumes that the dates are always in sorted order, as they are in your input screenshot.

I would probably structure this differently, building a row iterator to pass to itertools.groupby, but I wanted to keep this as novice-friendly, and as close to your original code, as possible.

Also, I suspect you really don't want this:

[[date1, temp1a, temp1b, temp1c], 
 [date2, temp2a, temp2b]]

… but rather something like this:

{date1: [temp1a, temp1b, temp1c], 
 date2: [temp1a, temp1b, temp1c]}

But without knowing what you're intending to do with this info, I can't tell you how best to store it.

extract information from excel into python 2d array

Answers (2)

Related Questions