Alina Lerner
Alina Lerner

Reputation: 59

Interpolate to the new x-axis by group in pandas

I have a dataframe of observations, which looks like (indexes are mixed up after sorting the dataset by date):

                     date   depth   temp    salt
25799 1962-10-25 08:00:00 -242.34  5.470  12.900
25798 1962-10-25 08:00:00 -227.19  5.460  12.840
25797 1962-10-25 08:00:00 -201.93  5.500  12.750
25796 1962-10-25 08:00:00 -176.68  5.530  12.610
25795 1962-10-25 08:00:00 -151.43  5.500  12.370
                      ...     ...    ...     ...
21617 2019-02-06 07:59:00  -20.00  3.780   7.532
21616 2019-02-06 07:59:00  -15.40  3.771   7.531
21615 2019-02-06 07:59:00  -10.00  3.759   7.529
21614 2019-02-06 07:59:00   -5.40  3.747   7.528
21613 2019-02-06 07:59:00      -0  5.748   7.528

I need to make interpolation of temperature and salinity values to the depth levels from -230, -225, ..., -5, -0 for each date:

interp_dep = [val for val in range(-230, 5, 5)]

The method described in the topic Python Pandas interpolate with new x-axis doesn't work for me, as I don't consider indexes. The interpolation within a groupby described in Pandas interpolate within a groupby is not suitable, as I need to interpolate the values by group of date, but to the new depth values. I am sort of confused, how to deal with this. Appreciate any help.

Upvotes: 1

Views: 350

Answers (1)

Björn
Björn

Reputation: 1832

After discussing some missunderstandings in the comments I think I can provide a solution. An Example output for the meassurement time point 05.08.1968 23:59:00

Example Output

Out[20]: 
                     depth      temp       salt
1968-05-08 23:59:00   -235  5.766500  12.716000
1968-05-08 23:59:00   -230  5.764000  12.706000
1968-05-08 23:59:00   -225  5.761500  12.696000
1968-05-08 23:59:00   -220  5.752000  12.683000
1968-05-08 23:59:00   -215  5.732000  12.665500
1968-05-08 23:59:00   -210  5.712000  12.648000
1968-05-08 23:59:00   -205  5.692000  12.630500
1968-05-08 23:59:00   -200  5.675200  12.607200
               ...
1968-05-08 23:59:00    -40  2.985000   7.580000
1968-05-08 23:59:00    -35  3.282500   7.550000
1968-05-08 23:59:00    -30  3.580000   7.520000
1968-05-08 23:59:00    -25  3.797500   7.502500
1968-05-08 23:59:00    -20  4.015000   7.485000
1968-05-08 23:59:00    -15  4.232500   7.467500
1968-05-08 23:59:00    -10  4.450000   7.450000
1968-05-08 23:59:00     -5  4.516667   7.450000

Solution For my final code solution, I change the index to datetime object, use groupby()to iterate over all measurement time points and then us np.interp() for the interpolation. The results are saved to a list of pandas DataFrame and finally concatenated to one dataframe with interpolated_df = pd.concat(results)

import pandas as pd
from pathlib import Path
import numpy as np


df = pd.read_csv(Path("C:/Test/obs.csv"))
df['date'] = pd.to_datetime(df['date'])
df_nid=df.set_index('date')

# timepoints = df_nid['1968-05-08 23'].groupby('date')
timepoints = df_nid.groupby('date')

results = []
x = np.arange(-235,0,5)

for name, time in timepoints:
    time =time.sort_values('depth')
    xp = time['depth']

    yp_temp = time['temp']
    y_temp = np.interp(x,xp,yp_temp)

    yp_salt = time['salt']
    y_salt = np.interp(x, xp, yp_salt)

    data = {'depth':x,
            'temp':y_temp,
            'salt':y_salt}
    # Create 47 x copies of the current Timestamps as index [-235:0:5] Steps
    idx = [name] * len(x)
    results.append(pd.DataFrame(data=data,index=idx))

interpolated_df = pd.concat(results)

Upvotes: 1

Related Questions