Reputation: 59
I have a dataframe of observations, which looks like (indexes are mixed up after sorting the dataset by date):
date depth temp salt
25799 1962-10-25 08:00:00 -242.34 5.470 12.900
25798 1962-10-25 08:00:00 -227.19 5.460 12.840
25797 1962-10-25 08:00:00 -201.93 5.500 12.750
25796 1962-10-25 08:00:00 -176.68 5.530 12.610
25795 1962-10-25 08:00:00 -151.43 5.500 12.370
... ... ... ...
21617 2019-02-06 07:59:00 -20.00 3.780 7.532
21616 2019-02-06 07:59:00 -15.40 3.771 7.531
21615 2019-02-06 07:59:00 -10.00 3.759 7.529
21614 2019-02-06 07:59:00 -5.40 3.747 7.528
21613 2019-02-06 07:59:00 -0 5.748 7.528
I need to make interpolation of temperature and salinity values to the depth levels from -230, -225, ..., -5, -0 for each date:
interp_dep = [val for val in range(-230, 5, 5)]
The method described in the topic Python Pandas interpolate with new x-axis doesn't work for me, as I don't consider indexes. The interpolation within a groupby described in Pandas interpolate within a groupby is not suitable, as I need to interpolate the values by group of date, but to the new depth values. I am sort of confused, how to deal with this. Appreciate any help.
Upvotes: 1
Views: 350
Reputation: 1832
After discussing some missunderstandings in the comments I think I can provide a solution. An Example output for the meassurement time point 05.08.1968 23:59:00
Example Output
Out[20]:
depth temp salt
1968-05-08 23:59:00 -235 5.766500 12.716000
1968-05-08 23:59:00 -230 5.764000 12.706000
1968-05-08 23:59:00 -225 5.761500 12.696000
1968-05-08 23:59:00 -220 5.752000 12.683000
1968-05-08 23:59:00 -215 5.732000 12.665500
1968-05-08 23:59:00 -210 5.712000 12.648000
1968-05-08 23:59:00 -205 5.692000 12.630500
1968-05-08 23:59:00 -200 5.675200 12.607200
...
1968-05-08 23:59:00 -40 2.985000 7.580000
1968-05-08 23:59:00 -35 3.282500 7.550000
1968-05-08 23:59:00 -30 3.580000 7.520000
1968-05-08 23:59:00 -25 3.797500 7.502500
1968-05-08 23:59:00 -20 4.015000 7.485000
1968-05-08 23:59:00 -15 4.232500 7.467500
1968-05-08 23:59:00 -10 4.450000 7.450000
1968-05-08 23:59:00 -5 4.516667 7.450000
Solution
For my final code solution, I change the index to datetime
object, use groupby()
to iterate over all measurement time points and then us np.interp()
for the interpolation. The results are saved to a list of pandas DataFrame and finally concatenated to one dataframe with interpolated_df = pd.concat(results)
import pandas as pd
from pathlib import Path
import numpy as np
df = pd.read_csv(Path("C:/Test/obs.csv"))
df['date'] = pd.to_datetime(df['date'])
df_nid=df.set_index('date')
# timepoints = df_nid['1968-05-08 23'].groupby('date')
timepoints = df_nid.groupby('date')
results = []
x = np.arange(-235,0,5)
for name, time in timepoints:
time =time.sort_values('depth')
xp = time['depth']
yp_temp = time['temp']
y_temp = np.interp(x,xp,yp_temp)
yp_salt = time['salt']
y_salt = np.interp(x, xp, yp_salt)
data = {'depth':x,
'temp':y_temp,
'salt':y_salt}
# Create 47 x copies of the current Timestamps as index [-235:0:5] Steps
idx = [name] * len(x)
results.append(pd.DataFrame(data=data,index=idx))
interpolated_df = pd.concat(results)
Upvotes: 1