Hu gePanic
Hu gePanic

Reputation: 110

How to fit a piecewise step function to be below my data points?

I have a custom function that takes 6 parameters. It creates a step "curve".
I want to optimize/fit the last 5 parameters (m1, m2, FL1, FL2 and FL3).

def steps(x, FL1, m1, FL2, m2, FL3):
    if x > m1:
        return FL1
    if x > m2:
        return FL2
    else:
        return FL3

I want to fit this function to always be below my data points.

Example:

Simplified Example

The real data looks slightly more complex, like this, but still not complex:

Actual data

What functionality can I use here to fit this function?

I assume scipy.optimize.curve_fit does not work due to least_squares is used. That way it will be centered around the data points.

from scipy.optimize import curve_fit  

def steps(m, FL1, m1, FL2, m2, FL3):
    if m > m1:
        return FL1
    if mass > m2:
        return FL2
    else:
        return FL3

x_data = [8000, 7000, 6000, 5000]  
y_data = [300, 350, 400, 450]  

popt, _ = curve_fit(steps, x_data, y_data)  

This fails:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

It does not seem to like the if statements in the function.

Upvotes: 1

Views: 141

Answers (2)

kikyo91
kikyo91

Reputation: 37

I agree with Mikael's answer that you should try to make your expression as concise as possible. I have an example of your real data shape, that combines linear functions and a constant function.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# Define step function
def step_func(x, a, c, f1, f2):
    result = []
    for xi in x:
        if xi >= 0 and xi < f1:
            result.append(a * xi)
        elif xi >= f1 and xi < f2:
            result.append(a * f1)
        elif xi >= f2:
            result.append(c * xi + a * f1 - f2 * c)
        else:
            result.append(0)
    return np.asarray(result)

# Generating some demo data
x_data = np.linspace(0, 10, 100)
y_data = step_func(x_data, 2, 1, 3, 7) + np.random.normal(0, 0.5, x_data.size)  # 添加噪声

# Fitting function
def fitting_func(x, a, c, f1, f2):
    return step_func(x, a, c, f1, f2)

# Fitting for data
popt, pcov = curve_fit(fitting_func, x_data, y_data, p0=[1, -1, 4, 6])

# Obtain fitting parameters
a_fit, c_fit, f1_fit, f2_fit = popt
print(f'Fitting param: a={a_fit}, c={c_fit}, f1={f1_fit}, f2={f2_fit}')

# Generat fitting results
y_fit = fitting_func(x_data, *popt)

# Plot
plt.scatter(x_data, y_data, label='Data', s=10)
plt.plot(x_data, y_fit, color='red', label='Fitted')
plt.vlines([f1_fit, f2_fit], ymin = min(x_data), ymax = max(x_data) )
# plt.xlim(0,16)
# plt.ylim(0,1)
plt.legend()
plt.xlabel('x')
plt.ylabel('y')
plt.title('Step func fitting')
plt.show()

The above returns:

Fitting param: a=1.9532567200579396, c=1.085588494015425, f1=3.084102835100397, f2=7.1628504872999095

enter image description here

Upvotes: 0

Mikael &#214;hman
Mikael &#214;hman

Reputation: 2375

Your optimization only has 2 parameters to optimize; m1, m2. The height of the step follows from your interpolated data. For example, FL1 == y_data[0] always.

You'll need to express this in your function, and you could use it in curve_fit. It also expects you to vectorize your function, but i just couldn't be bothered.

If we express this function

import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np

x_data = np.linspace(8000, 5000, 10)
y_data = np.array([400, 420, 440, 450, 453, 454, 452, 453, 453, 452])


def limit(m):
    return np.interp(m, np.flip(x_data), np.flip(y_data))


def constrained_steps(m, m1, m2):
    results = list()  # Could be vectorized if you cared enough.
    for i in range(len(m)):
        if m[i] > m1:
            results.append(y_data[0])
        elif m[i] > m2:
            results.append(limit(m1))
        else:
            results.append(limit(m2))
    return np.array(results)


# Initialize initial guess at 2/3 and 1/3's through the x_data.
bounds = x_data[[-1,0]]
p0 = ((bounds[1]-bounds[0])*2/3 + bounds[0], (bounds[1]-bounds[0])*1/3 + bounds[0])
popt, _ = curve_fit(constrained_steps, x_data, y_data, p0=p0, bounds=bounds)
opt_m1, opt_m2 = popt

ms = np.linspace(bounds[0], bounds[1], 1000)
plt.scatter(x_data, y_data)
plt.plot(ms, constrained_steps(ms, opt_m1, opt_m2))
plt.gca().invert_xaxis()
plt.show()

Upvotes: 1

Related Questions