Reputation: 23
I'm currently working a project to estimate flow meter uncertainty. The meter uncertainty is based on four different values:
A third party provides tables for the meter at multiple different values for liq, cP, wlr and gvf. As you can guess the data from the meter never perfectly falls into one of the predefined values. For example a minute of data may read:
With the data above a four way interpolation on the tables is performed to find what the uncertainty.
I've come up with a solution but it seems clunky and I'm wondering if anyone has any ideas. I'm still new to the pandas game and really appreciate seeing other peoples solutions.
Initially I sort the data to reduce the table down to the values above and below the actual point that I'm looking for.
aliq = 6532 # stbpd
avisc = 22 # centipoise
awlr = 0.412 # water liquid ratio
agvf = 0.634 # gas volume fraction
def findclose(num, colm):
arr = colm.unique()
if num in arr:
clslo = num
clshi = num
else:
clslo = arr[arr > num].min() # close low value
clshi = arr[arr < num].max() # close high value
return [clslo, clshi]
df = tbl_vx52[
(tbl_vx52['liq'].isin(findclose(aliq,tbl_vx52['liq']))) &
(tbl_vx52['visc'].isin(findclose(avisc,tbl_vx52['visc']))) &
(tbl_vx52['wlr'].isin(findclose(awlr,tbl_vx52['wlr']))) &
(tbl_vx52['gvf'].isin(findclose(agvf,tbl_vx52['gvf'])))
].reset_index(drop=True)
The table is reduced down from 2240 to 16 values. Instead of including all the data (tbl_vx52). I've created some code to load so you can see what the sub dataframe looks like, called df, with just the values above and below the areas for this example.
df = pd.DataFrame({'liq':[5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000, 7000, 7000, 7000, 7000, 7000, 7000, 7000, 7000],
'visc':[10, 10, 10, 10, 30, 30, 30, 30, 10, 10, 10, 10, 30, 30, 30, 30],
'wlr':[0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5],
'gvf':[0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75],
'uncert':[0.0707, 0.0992, 0.0906, 0.1278, 0.0705, 0.0994, 0.091, 0.128, 0.0702, 0.0991, 0.0905, 0.1279, 0.0704, 0.0992, 0.0904, 0.1283],
})
Some pretty crude looping is done to start pairing the values based on individual inputs (either liq, visc, wlr or gvf). Shown below is the first loop on gvf.
pairs = [
slice(0,1),
slice(2,3),
slice(4,5),
slice(6,7),
slice(8,9),
slice(10,11),
slice(12,13),
slice(14,15)]
for pair in pairs:
df.loc[pair,'uncert'] = np.interp(
agvf,
df.loc[pair,'gvf'],
df.loc[pair,'uncert']
)
df.loc[pair,'gvf'] = agvf
df = df.drop_duplicates().reset_index(drop=True)
The duplicate values are dropped, reducing from 16 rows to 8 rows. This is then repeated again for wlr.
pairs = [
slice(0,1),
slice(2,3),
slice(4,5),
slice(6,7)
]
for pair in pairs:
df.loc[pair,'uncert'] = np.interp(
awlr,
df.loc[pair,'wlr'],
df.loc[pair,'uncert']
)
df.loc[pair,'wlr'] = awlr
df = df.drop_duplicates().reset_index(drop=True)
The structure above is repeated for visc (four rows) and finally liquid (two rows) until only one value in the sub array is left. Which gives the uncertainty in meter at your operating point.
I know its pretty clunky. Any input or thoughts on different methods is appreciated.
Upvotes: 1
Views: 489
Reputation: 23
Alright, I was able to find and apply a matrix based solution. It is based on a matrix method for trilinear interpolation which can be expanded to quad-linear interpolation. Wikipedia provides a good write up on trilinear interpolation. The 8x8 matrix in the wikipedia article can be expanded to a 16x16 for quadlinear interpolation. A single function is written below to make each row inside the matrix.
def quad_row(x, y, z, k):
"""
Generate a row for the quad interpolation matrix
x, y, z, k are scalar input values
"""
qrow = [1,
x, y, z, k,
x*y, x*z, x*k, y*z, y*k, z*k,
x*y*z, x*y*k, x*z*k, y*z*k,
x*y*z*k]
return qrow
It should be evident that this is just an extension of the rows inside the trilinear matrix. The function can be looped across sixteen times to generate the entire matrix.
Side Note: If you want to get fancy you can accomplish the quad_row function using itertools combinations. The advantage is that you can input an array of any size and it returns the properly formatted row for the interpolation matrix. The function is more flexible, but ultimately slower.
from itertools import combinations
def interp_row(values):
values = np.asarray(values)
n = len(values)
intp_row = [1]
for i in range(1, n+1):
intp_row.extend([np.product(x) for x in list(combinations(values, i))])
return intp_row
The function that accepts an input table, finds the values close to your interpolated values, builds the interpolation matrix and performs the matrix math is shown below.
def quad_interp(values, table):
"""
values - four points to interpolate across, pass as list or numpy array
table - lookup data, four input columns and one output column
"""
table = np.asarray(table)
A, B, C, D, E = np.transpose(table)
a, b, c, d = values
in_vector = quad_row(a, b, c, d)
mask = (
np.isin(A, findclose(a, A)) &
np.isin(B, findclose(b, B)) &
np.isin(C, findclose(c, C)) &
np.isin(D, findclose(d, D)))
quad_matrix = []
c_vector = []
for row in table[mask]:
x, y, z, v, w = row
quad_matrix.append(quad_row(x, y, z, v))
c_vector.append(w)
quad_matrix = np.matrix(quad_matrix)
c_vector = np.asarray(c_vector)
a_vector = np.dot(np.linalg.inv(quad_matrix), c_vector)
return float(np.dot(a_vector, in_vector))
For example, calling the function would look like this.
df = pd.DataFrame({'liq':[5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000, 7000, 7000, 7000, 7000, 7000, 7000, 7000, 7000],
'visc':[10, 10, 10, 10, 30, 30, 30, 30, 10, 10, 10, 10, 30, 30, 30, 30],
'wlr':[0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5, 0.375, 0.375, 0.5, 0.5],
'gvf':[0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75, 0.625, 0.75],
'uncert':[0.0707, 0.0992, 0.0906, 0.1278, 0.0705, 0.0994, 0.091, 0.128, 0.0702, 0.0991, 0.0905, 0.1279, 0.0704, 0.0992, 0.0904, 0.1283],
})
values = [6532, 22, 0.412, 0.634]
quad_interp(values, df)
As seen, no error handling exists for the above function. It will break down if the following is attempted:
1. Interpolating values outside table boundaries.
2. Inputting lookup values that are already in the table, resulting in less than 16 points being selected.
Also, I acknowledge the following:
1. Naming convention could of been better
2. Faster way may exist for creating the mask function
The function findclose() is shown the original question.
Please let me know if you have any feedback or room for improvement .
Upvotes: 1