Reputation: 107
I am having a problem manipulating my excel file in python. I have a large excel file with data arranged by date/time. I would like to be able to average the data for a specific time of day, over all the different days; ie. to create an average profile of the gas_concentrations over 1 day.
Here is a sample of my excel file:
Decimal Day of year Decimal of day Gas concentration
133.6285 0.6285 46.51230
133.6493 0.6493 47.32553
133.6701 0.6701 49.88705
133.691 0.691 51.88382
133.7118 0.7118 49.524
133.7326 0.7326 50.37112
Basically I need a function, like the AVERAGEIF function in excel, that will say something like "Average the gas_concentrations when decimal_of_day=x" However I really have no idea how to do this. Currently I have got this far
import xlrd
import numpy as np
book= xlrd.open_workbook('TEST.xlsx')
level_1=book.sheet_by_index(0)
time_1=level_1.col_values(0, start_rowx=1, end_rowx=1088)
dectime_1=level_1.col_values(8, start_rowx=1, end_rowx=1088)
ozone_1=level_1.col_values(2, start_rowx=1, end_rowx=1088)
ozone_1 = [float(i) if i != 'NA' else 'NaN' for i in ozone_1]
Edit
I updated my script to include the following
ozone=np.array(ozone_1, float)
time=np.array(dectime_1)
a=np.column_stack((ozone, time))
b=np.where((a[:,0]<0.0035))
print b
EDIT Currently I solved the problem by putting both the variables into an array, then making a smaller array with just the variables I need to average - a bit inefficient but it works!
ozone=np.array(ozone_1, float)
time=np.array(dectime_1)
a=np.column_stack((ozone, time))
b=a[a[:,1]<0.0036]
c=np.nanmean(b[:,0])
Upvotes: 1
Views: 2238
Reputation: 1492
You can use numpy masked array.
import numpy as np
data_1 = np.ma.arange(10)
data_1 = np.ma.masked_where(<your if statement>, data_1)
data_1_mean = np.mean(data1)
Hope that helps
Upvotes: 1