Reputation: 75
I am writing a code in python for a project that has to accomplish a few things; 1) read in data from an xls file column by column 2) average each row of the columns in groups of three 3) then average the resulting columns
I have accomplished 1 and 2 but can't quite seem to get 3, I think a lot of the trouble I'm having stems from the fact that I am using float however I need the numbers to 6 decimal places. Any help and patience is appreciated, I'm very new to python
v = open("Pt_2_Test_Data.xls", 'wb') #created file to write output to
w = open("test2.xls")
count = 0
for row in w: #read in file
for line in w:
columns = line.split("\t") #split up into columns
date = columns[0]
time = columns[1]
a = columns[2]
b = columns[3]
c = columns[4]
d = columns[5]
e = columns[6]
f = columns[7]
g = columns[8]
h = columns[9]
i = columns[10]
j = columns[11]
k = columns[12]
l = columns[13]
m = columns[14]
n = columns[15]
o = columns[16]
p = columns[17]
q = columns[18]
r = columns[19]
s = columns[20]
t = columns[21]
u = columns[22]
LZA = columns[23]
SZA = columns[24]
LAM = columns[25]
count += 1
A = 0
if count != 0: # gets rid of column tiles
filter1 = ((float(a) + float(b) + float(c))/3)
filter1 = ("%.6f" %A)
filter2 = (float(d) + float(e) + float(f))/3
filter2 = ("%.6f" %filter2)
filter3 = (float(g) + float(h) + float(i))/3
filter3 = ("%.6f" %filter3)
filter4 = (float(j) + float(k) + float(l))/3
filter4 = ("%.6f" %filter4)
filter5 = (float(m) + float(n) + float(o))/3
filter5 = ("%.6f" %filter5)
filter6 = (float(p) + float(q) + float(r))/3
filter6 = ("%.6f" %filter6)
filter7 = (float(s) + float(t) + float(u))/3
filter7 = ("%.6f" %filter7)
A = [filter1, filter2, filter3, filter4, filter5, filter6, filter7]
A = ",".join(str(x) for x in A).join('[]')
print A
avg = [float(sum(col))/float(len(col)) for col in zip(*A)]
print avg
I have also tried formatting the data like so:
A = ('{0} {1} {2} {3} {4} {5} {6} {7} {8}'.format(date, time, float(filter1), float(filter2), float(filter3), float(filter4), float(filter5), float(filter6), float(filter7))+'\n') # average of triplets
print A
thinking I could access the values of each column and preform the necessary math on them by calling them like you would when using a dictionary, however this was unsuccessful:it seemed it was recognizing the data either as a row (so trying to access any column by [0] was out of bounds) or by the individual characters, not as a list of numbers. Is this related to using the float function?
Upvotes: 2
Views: 236
Reputation: 123453
I'm not sure I understand which columns you want to average in 3), but maybe this does what you want:
with open("test2.xls") as w:
w.next() # skip over header row
for row in w:
(date, time, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t,
u, LZA, SZA, LAM) = row.split("\t") # split columns into fields
A = [(float(a) + float(b) + float(c))/3,
(float(d) + float(e) + float(f))/3,
(float(g) + float(h) + float(i))/3,
(float(j) + float(k) + float(l))/3,
(float(m) + float(n) + float(o))/3,
(float(p) + float(q) + float(r))/3,
(float(s) + float(t) + float(u))/3]
print ('['+ ', '.join(['{:.6f}']*len(A)) + ']').format(*A)
avg = sum(A)/len(A)
print avg
You could do the same thing a little more concisely with code like the following:
avg = lambda nums: sum(nums)/float(len(nums))
with open("test2.xls") as w:
w.next() # skip over header row
for row in w:
cols = row.split("\t") # split into columns
# then split that into fields
date, time, values, LZA, SZA, LAM = (cols[0], cols[1],
map(float, cols[2:23]),
cols[23], cols[24], cols[25])
A = [avg(values[i:i+3]) for i in xrange(0, 21, 3)]
print ('['+ ', '.join(['{:.6f}']*len(A)) + ']').format(*A)
print avg(A)
Upvotes: 1
Reputation: 137
I would consider using numpy. I'm not sure how to read in xls files, but there seem to be packages out there that provide this functionality. I'd do something like this:
import numpy as np
with open("test2.txt") as f:
for row in f:
# row is a string, split on tabs, but ignore the values that
# don't go into the average. If you need to keep those you
# might want to look into genfromtxt and defining special datatypes
data = (np.array(row.split('\t')[2:23])).astype(np.float)
# split the data array into 7 separate arrays (3 columns each) and average on those
avg = np.mean(np.array_split(data,7))
print avg
I'm not sure if the avg above is exactly what you want. You might need to save off the smaller arrays (smallArrays = np.array_split(data,7)
) then iterate over those, calculating the average as you go.
Even if this isn't exactly what you want, I recommend looking into numpy. I've found it to be really easy to use and very helpful when it comes to doing calculations like you're trying to do.
Upvotes: 0
Reputation: 856
You can use the decimal
module to display the exact numbers.
from decimal import *
getcontext().prec = 6 # sets the precision to 6
note that floating points are used which means that:
print(Decimal(1)/(Decimal(7)) # 0.142857
print(Decimal(100)/(Decimal(7)) # results in 14.2857
This means you probably need to set the precision to a higher value to get 6 decimal places... for example:
from decimal import *
getcontext().prec = 28
print("{0:.6f}".format(Decimal(100) / Decimal(7))) # 14.285714
To give a complete answer to your question, could you explain what average you are looking for? The average over all (21) columns? Could you maybe post some test_data.xls?
Upvotes: 1