Reputation: 111
I am trying to perform a linear interpolation in Python such that the interpolated values are calculated off certain values in a certain column.
In my example below, I want to interpolate Measurements for Products "a", and "b" using the actual numerical values from the relevant cells in Height column in the interpolation. Could I call the Height column the index for the interpolation?
When I am interpolating the missing Measurements for Product "a", the index values to use in interpolation are 4, 5 and 7. For interpolation of Product "b", index values to use are 1, 2.2 and 3.
I know about dataframe.interpolate() but I am struggling to modify my code to use the correct index values.
Here is the code for the dataframe I am starting with:
import pandas as pd
testdata1 = [('Product', ['a', 'a', 'a', 'b', 'b', 'b','b']),
('Height', ['4', '5', '7', '1', '2.2', '3','4']),
('Measurement', ['35.00', '', '55.00','10.00','','30.00','40.00']),
]
df = pd.DataFrame.from_items(testdata1)
df
And here is the code for the dataframe I need:
targetdf = [('Product', ['a', 'a', 'a', 'b', 'b', 'b','b']),
('Height', ['4', '5', '7', '1', '2.2', '3','4']),
('Measurement', ['35.00', '41.67', '55.00','10.00','22.00','30.00','40.00']),
]
df2 = pd.DataFrame.from_items(targetdf)
df2
If this cannot be done with a dataframe, I am open to other ideas to do it in Python.
Any help is greatly appreciated. I am new to Python. Thank you.
Upvotes: 1
Views: 1539
Reputation: 862481
You can use first some data cleaning:
#convert '' to `NaN`s and all strings to numeric
df['Measurement'] = pd.to_numeric(df['Measurement'], errors='coerce')
#convert strings to numeric
df['Height'] = df['Height'].astype(float)
Then set_index
, groupby
and apply
custom function with interpolate
, last reset_index
and get original order of columns by reindex
:
df1 = df.set_index('Height') \
.groupby('Product') \
.apply(lambda x: x.interpolate(method='index')) \
.reset_index() \
.reindex(columns=df.columns)
print (df1)
Product Height Measurement
0 a 4.0 35.000000
1 a 5.0 41.666667
2 a 7.0 55.000000
3 b 1.0 10.000000
4 b 2.2 22.000000
5 b 3.0 30.000000
6 b 4.0 40.000000
Upvotes: 2