Reputation: 481
I am attempting to calculate the area of the blue region and the area of yellow region: In this graph: y=blue, peak_line=green, thresh=orange.
I am using this code:
idx = np.argwhere(np.diff(np.sign(y - peak_line))).flatten()
bounds = [1077.912, 1078.26, 1078.336, 1078.468, 1078.612, 1078.78, 1078.828, 1078.88, 1079.856, 1079.86]
plt.plot(x, y, x, thresh, x, peak_line)
plt.fill_between(x, y, thresh, where=(y>=peak_line),interpolate=True, color='#fff8ba')
plt.fill_between(x, thresh, peak_line, where=(y<=peak_line),interpolate=True, color='#fff8ba')
plt.fill_between(x, y, peak_line, where=(y>=peak_line) & (x>=x[idx][0]) & (x<=bounds[-1]), interpolate=True, color='#CDEAFF')
plt.plot(x[idx], y[idx], 'ro')
plt.show()
estimated_y = interp1d(x, y, kind='cubic')
estimated_peak_line = interp1d(x, peak_line, kind='cubic')
estimated_thresh = interp1d(x, thresh, kind='cubic')
yellow_areas = []
blue_areas = []
for i in range(len(bounds) - 1):
midpoint = (bounds[i] + bounds[i+1]) / 2
if estimated_y(midpoint) < estimated_peak_line(midpoint):
above_peak_line = abs(integrate.quad(estimated_peak_line, bounds[i], bounds[i+1])[0])
above_thresh_line = abs(integrate.quad(estimated_thresh, bounds[i], bounds[i+1])[0])
yellow_areas.append(above_peak_line - above_thresh_line)
else:
above_peak_line = abs(integrate.quad(estimated_peak_line, bounds[i], bounds[i+1])[0])
above_y = abs(integrate.quad(estimated_y, bounds[i], bounds[i+1])[0])
blue_areas.append(above_peak_line - above_y)
print(sum(yellow_areas))
print(sum(blue_areas))
4.900000000000318
2.999654602006661
I thought I calculated the area of the blue region and the area of yellow region correct, until I calculated the area of the polygon:
bunch_of_xs = np.linspace(min(x), max(x), num=10000, endpoint=True)
final_curve = estimated_y(bunch_of_xs)
final_thresh = estimated_thresh(bunch_of_xs)
final_peak_line = estimated_peak_line(bunch_of_xs)
def PolygonArea(corners):
n = len(corners) # of corners
area = 0.0
for i in range(n):
j = (i + 1) % n
area += corners[i][0] * corners[j][1]
area -= corners[j][0] * corners[i][1]
area = abs(area) / 2.0
return area
vertex1 = (bunch_of_xs[0], final_thresh[0])
vertex2 = (bunch_of_xs[-1], final_thresh[-1])
vertex3 = (x[idx][-1], y[idx][-1])
vertex4 = (x[idx][0], y[idx][0])
coords = (vertex1,vertex2,vertex3,vertex4)
plt.plot(x, y, 'o', bunch_of_xs, final_curve, '--', bunch_of_xs, final_thresh, bunch_of_xs, final_peak_line)
x_val = [x[0] for x in coords]
y_val = [x[1] for x in coords]
plt.plot(x_val,y_val,'or')
print("Coordinates of total polygon:", coords)
print("Total polygon area:", PolygonArea(coords))
Coordinates of total polygon: ((1077.728, -41.30177170550451), (1079.96, -42.254314285935834), (1079.86, -49.207348695828706), (1077.912, -48.271572477115136))
Total polygon area: 14.509708069890621
The sum of the area of the blue region and the area of yellow region should equal the total polygon area.
4.900000000000318 + 2.999654602006661 ≠ 14.509708069890621
What am I doing wrong?
Edit: This code will be used for many different graphs. Not all graphs look the same. For example, this graph has 3 blue regions and so I have to calculate the area of all 3 blue regions and add them together to get the total blue area. Every graph has a different amount of blue regions (some only have 1 region). So, I have to make the code flexible to account for the possibility of a graph having multiple blue regions to add together to get the total blue region area.
Upvotes: 4
Views: 1251
Reputation: 25489
In general, the area between two curves f(x)
and g(x)
is integral(g(x) - f(x))
.
So say we have two curves:
xvals = np.linspace(0, 1, 100)
yvals_1 = np.sin(xvals * 10)
yvals_2 = 0.5 - 0.5 * xvals
plt.plot(xvals, yvals_1, '-b')
plt.plot(xvals, yvals_2, '-g')
The "transformed" curve becomes:
yvals_3 = yvals_1 - yvals_2
plt.plot(xvals, yvals_3, '--r')
plt.plot(xvals, np.zeros(xvals.shape), '--k')
And since we want to ignore everything under the green line,
yvals_3[yvals_3 < 0] = 0
plt.plot(xvals, yvals_3, '-r')
Since you want to impose additional constraints, such as "only the area between the first and last intersections", do that now.
# Cheating a little bit -- but you already know how to get the intersections.
first_intersection_x = xvals[4]
last_intersection_x = xvals[94]
cfilter = np.logical_and(xvals >= first_intersection_x, xvals <= last_intersection_x)
xvals_calc = xvals[cfilter]
yvals_calc = yvals_3[cfilter]
The area under this curve is easily calculated using np.trapz
area_under_curve = np.trapz(yvals_calc, xvals_calc)
Of course, this answer assumes that yvals_1
and yvals_2
are available at the same xvals
. If not, interpolation is easy.
Upvotes: 2
Reputation: 689
Since I don't have all of your data I will give something between pseudo-code and implementation.
Say we have arrays x
(x-axis), y1
(data), y2
(some line which bounds the parts over which we want to integrate).
First step: Iterate over your bounds array and see which parts we want to integrate over. I assume that you have the bounds array already, as your question suggests.
def get_pairs_of_idxs(x, y1, y2, bounds):
lst_pairs = []
for i in range(len(bounds)-1):
x0, x1 = bounds[i], bounds[i+1]
xc = 0.5 * (x0 + x1) # we want to see if the straight line y2 is above or below, so we take one x value and test it
indx_xc = np.searchsorted(x, xc) # this returns us the index at which xc is located
y1c, y2c = y1[indx_xc], y2[indx_xc]
if y2c < y1c: # then the line is below the curve, so we want to integrate
lst_pairs.append((x0, x1))
Now we have a list of pairs of indices, between which we want to integrate.
def solution(x, y1, y2, bounds):
tot_area = 0
lst_pairs = get_pairs_of_idxs(x, y1, y2, bounds)
for x0, x1 in lst_pairs:
mask = np.logical_and(x >= x0, x <= x1) # relevant places in x and y data
xs = x[mask] # the x values along which we integrate
ys = (y2 - y1)[mask] # we want to integrate the difference of the curves
tot_area += np.trapz(ys, xs)
return tot_area
That's what I was thinking about.
Upvotes: 2