Reputation: 45
I have two arrays of data: one is a radius values and the other is a corresponding intensity reading at that intensity:
e.g. a small section of the data. First column is radius and the second is the intensities.
29.77036614 0.04464427
29.70281027 0.07771409
29.63523525 0.09424901
29.3639355 1.322793
29.29596385 2.321502
29.22783249 2.415751
29.15969437 1.511504
29.09139827 1.01704
29.02302068 0.9442765
28.95463729 0.3109002
28.88609766 0.162065
28.81754446 0.1356054
28.74883612 0.03637681
28.68004928 0.05952569
28.61125036 0.05291172
28.54229804 0.08432806
28.4732599 0.09950128
28.43877462 0.1091304
28.40421016 0.09629156
28.36961249 0.1193614
28.33500089 0.102711
28.30037503 0.07161685
How can I bin the radius data, and find the average intensity corresponding to that binned radius.
The aim of this is to then use the average intensity to assign an intensity value to a radius data with a missing (NaN) data point.
I've never had to use the histogram functions before and have very little idea of how they work/ if its possible to do this with them. The full data set is large with 336622 number of data points, so I don't really want to be using loops or if statements to achieve this.
Many Thanks for any help.
Upvotes: 2
Views: 1074
Reputation: 8831
It's not really histogramming what your are after. A histogram is more a count of items that fall into a specific bin. What you want to do is more a group by operation, where you'd group your intensities by radius intervals and on the groups of itensities you apply some aggregation method, like average or median etc.
What your are describing, however, sounds a lot more like some sort of interpolation you want to perform. So I would suggest to think about interpolation as an alternative to solve your problem. Anyways, here's a suggestion how you can achieve what you asked for (assuming you can use numpy) - I'm using random inputs to illustrate:
radius = numpy.fromiter((random.random() * 10 for i in xrange(1000)), dtype=numpy.float)
intensities = numpy.fromiter((random.random() * 10 for i in xrange(1000)), dtype=numpy.float)
# group your radius input into 20 equal distant bins
bins = numpy.linspace(radius.min(), radius.max(), 20)
groups = numpy.digitize(radius, bins)
# groups now holds the index of the bin into which radius[i] falls
# loop through all bin indexes and select the corresponding intensities
# perform your aggregation on the selected intensities
# i'm keeping the aggregation for the group in a dict
aggregated = {}
for i in range(len(bins)+1):
selected_intensities = intensities[groups==i]
aggregated[i] = selected_intensities.mean()
Upvotes: 1
Reputation: 69136
if you only need to do this for a handful of points, you could do something like this.
If intensites
and radius
are numpy arrays of your data:
bin_width = 0.1 # Depending on how narrow you want your bins
def get_avg(rad):
average_intensity = intensities[(radius>=rad-bin_width/2.) & (radius<rad+bin_width/2.)].mean()
return average_intensities
# This will return the average intensity in the bin: 27.95 <= rad < 28.05
average = get_avg(28.)
Upvotes: 2