Reputation: 481

Calculating area under the curves

I am attempting to calculate the area of the blue region and the area of yellow region: In this graph: y=blue, peak_line=green, thresh=orange.

I am using this code:

idx = np.argwhere(np.diff(np.sign(y - peak_line))).flatten()
bounds = [1077.912, 1078.26, 1078.336, 1078.468, 1078.612, 1078.78, 1078.828, 1078.88, 1079.856, 1079.86]
plt.plot(x, y, x, thresh, x, peak_line)
plt.fill_between(x, y, thresh, where=(y>=peak_line),interpolate=True, color='#fff8ba')
plt.fill_between(x, thresh, peak_line, where=(y<=peak_line),interpolate=True, color='#fff8ba')
plt.fill_between(x, y, peak_line, where=(y>=peak_line) & (x>=x[idx][0]) & (x<=bounds[-1]), interpolate=True, color='#CDEAFF')
plt.plot(x[idx], y[idx], 'ro')
plt.show()

estimated_y = interp1d(x, y, kind='cubic')
estimated_peak_line = interp1d(x, peak_line, kind='cubic')
estimated_thresh = interp1d(x, thresh, kind='cubic')

yellow_areas = []
blue_areas = []
for i in range(len(bounds) - 1):
    midpoint = (bounds[i] + bounds[i+1]) / 2
    if estimated_y(midpoint) < estimated_peak_line(midpoint):
        above_peak_line = abs(integrate.quad(estimated_peak_line, bounds[i], bounds[i+1])[0])
        above_thresh_line = abs(integrate.quad(estimated_thresh, bounds[i], bounds[i+1])[0])
        yellow_areas.append(above_peak_line - above_thresh_line)

    else:
        above_peak_line = abs(integrate.quad(estimated_peak_line, bounds[i], bounds[i+1])[0])
        above_y = abs(integrate.quad(estimated_y, bounds[i], bounds[i+1])[0])
        blue_areas.append(above_peak_line - above_y)

print(sum(yellow_areas))
print(sum(blue_areas))
4.900000000000318
2.999654602006661

I thought I calculated the area of the blue region and the area of yellow region correct, until I calculated the area of the polygon:

bunch_of_xs = np.linspace(min(x), max(x), num=10000, endpoint=True)
final_curve = estimated_y(bunch_of_xs)
final_thresh = estimated_thresh(bunch_of_xs)
final_peak_line = estimated_peak_line(bunch_of_xs)

def PolygonArea(corners):
    n = len(corners) # of corners
    area = 0.0
    for i in range(n):
        j = (i + 1) % n
        area += corners[i][0] * corners[j][1]
        area -= corners[j][0] * corners[i][1]
    area = abs(area) / 2.0
    return area

vertex1 = (bunch_of_xs[0], final_thresh[0])
vertex2 = (bunch_of_xs[-1], final_thresh[-1])
vertex3 = (x[idx][-1], y[idx][-1])
vertex4 = (x[idx][0], y[idx][0])
coords = (vertex1,vertex2,vertex3,vertex4)
plt.plot(x, y, 'o', bunch_of_xs, final_curve, '--', bunch_of_xs, final_thresh, bunch_of_xs, final_peak_line)
x_val = [x[0] for x in coords]
y_val = [x[1] for x in coords]
plt.plot(x_val,y_val,'or')
print("Coordinates of total polygon:", coords)
print("Total polygon area:", PolygonArea(coords))
Coordinates of total polygon: ((1077.728, -41.30177170550451), (1079.96, -42.254314285935834), (1079.86, -49.207348695828706), (1077.912, -48.271572477115136))
Total polygon area: 14.509708069890621

The sum of the area of the blue region and the area of yellow region should equal the total polygon area.

4.900000000000318 + 2.999654602006661 ≠ 14.509708069890621

What am I doing wrong?

Edit: This code will be used for many different graphs. Not all graphs look the same. For example, this graph has 3 blue regions and so I have to calculate the area of all 3 blue regions and add them together to get the total blue area. Every graph has a different amount of blue regions (some only have 1 region). So, I have to make the code flexible to account for the possibility of a graph having multiple blue regions to add together to get the total blue region area.

Upvotes: 4

Answers (2)

pho

Reputation: 25489

In general, the area between two curves f(x) and g(x) is integral(g(x) - f(x)).

So say we have two curves:

xvals = np.linspace(0, 1, 100)
yvals_1 = np.sin(xvals * 10)
yvals_2 = 0.5 - 0.5 * xvals

plt.plot(xvals, yvals_1, '-b')
plt.plot(xvals, yvals_2, '-g')

The "transformed" curve becomes:

yvals_3 = yvals_1 - yvals_2
plt.plot(xvals, yvals_3, '--r')
plt.plot(xvals, np.zeros(xvals.shape), '--k')

And since we want to ignore everything under the green line,

yvals_3[yvals_3 < 0] = 0
plt.plot(xvals, yvals_3, '-r')

Since you want to impose additional constraints, such as "only the area between the first and last intersections", do that now.

# Cheating a little bit -- but you already know how to get the intersections.
first_intersection_x = xvals[4]
last_intersection_x = xvals[94]

cfilter = np.logical_and(xvals >= first_intersection_x, xvals <= last_intersection_x)
xvals_calc = xvals[cfilter]
yvals_calc = yvals_3[cfilter]

The area under this curve is easily calculated using np.trapz

area_under_curve = np.trapz(yvals_calc, xvals_calc)

Of course, this answer assumes that yvals_1 and yvals_2 are available at the same xvals. If not, interpolation is easy.

Upvotes: 2

yonatansc97

Reputation: 689

Since I don't have all of your data I will give something between pseudo-code and implementation. Say we have arrays x (x-axis), y1 (data), y2 (some line which bounds the parts over which we want to integrate).

First step: Iterate over your bounds array and see which parts we want to integrate over. I assume that you have the bounds array already, as your question suggests.

def get_pairs_of_idxs(x, y1, y2, bounds):
    lst_pairs = []
    for i in range(len(bounds)-1):
        x0, x1 = bounds[i], bounds[i+1]
        xc = 0.5 * (x0 + x1) # we want to see if the straight line y2 is above or below, so we take one x value and test it
        indx_xc = np.searchsorted(x, xc) # this returns us the index at which xc is located
        y1c, y2c = y1[indx_xc], y2[indx_xc]
        if y2c < y1c: # then the line is below the curve, so we want to integrate
            lst_pairs.append((x0, x1))

Now we have a list of pairs of indices, between which we want to integrate.

def solution(x, y1, y2, bounds):
     tot_area = 0
     lst_pairs = get_pairs_of_idxs(x, y1, y2, bounds)
     for x0, x1 in lst_pairs:
          mask = np.logical_and(x >= x0, x <= x1) # relevant places in x and y data
          xs = x[mask] # the x values along which we integrate
          ys = (y2 - y1)[mask] # we want to integrate the difference of the curves
          tot_area += np.trapz(ys, xs)
     return tot_area

That's what I was thinking about.

Upvotes: 2

Calculating area under the curves

Answers (2)

Related Questions