krishnab
krishnab

Reputation: 10060

Pandas: Interpolating time series on a pivoted dataframe

I have a pandas dataframe with 9 time series variables: AS, CR6,TCPR, .... I need to interpolate values in the dataframe, though the final form of the dataframe needs to stay in the pivoted form for subsequent analysis. The tricky thing is that the dataframe is pivoted where the year is in the column name.

               AS_2014  AS_2015 AS_2016 AS_2017 AS_2018 AS_2019 CR6_2014 CR6_2015 CR6_2016  CR6_2017    ... TCPR123_2016 TCPR123_2017 TCPR123_2018 TCPR123_2019     TDS_2014 TDS_2015 TDS_2016  TDS_2017 TDS_2018 TDS_2019
s_wellid                                                                                    
0103041-001     NaN     2.3     NaN     NaN     NaN     0.0     0.0     NaN     NaN     NaN     ...     NaN     NaN     NaN     0.000   NaN     350.0   NaN     NaN     NaN     350.0
0105020-001     0.0     NaN     0.0     NaN     NaN     NaN     NaN     NaN     1.9     NaN     ...     NaN     NaN     0.001   0.000   NaN     NaN     NaN     NaN     NaN     NaN
0110001-008     2.0     2.0     2.0     2.0     2.0     NaN     1.0     1.0     1.0     1.0     ...     NaN     NaN     0.005   0.005   490.0   520.0   530.0   490.0   460.0   NaN
0110001-009     2.0     2.0     2.0     2.0     2.0     NaN     1.0     1.0     1.0     1.0     ...     NaN     NaN     0.005   NaN     570.0   550.0   540.0   550.0   560.0   NaN
0110001-010     2.0     2.0     2.0     2.0     2.0     NaN     1.0     1.0     1.0     1.0     ...     NaN     NaN     0.005

I need to basically group all of the columns by the prefix like AS_* or CR6_*, and then interpolate between the years.

I have tried unpivoting the data with the melt() command, but that creates all kinds of problems in trying to re-pivot the final results. Does someone know how to get started on this kind of problem.

Upvotes: 0

Views: 431

Answers (1)

jezrael
jezrael

Reputation: 862901

First use str.split by columns for MultiIndex, so possible reshape by DataFrame.stack with first level, then use DataFrame.droplevel:

df.columns = df.columns.str.split('_', expand=True)
df1 = df.stack(0).interpolate(axis=1)

print (df1)
                      2014   2015     2016    2017     2018     2019
s_wellid                                                            
0103041-001 AS         NaN    2.3    1.725    1.15    0.575    0.000
            CR6        0.0    0.0    0.000    0.00    0.000    0.000
            TCPR123    NaN    NaN      NaN     NaN      NaN    0.000
            TDS        NaN  350.0  350.000  350.00  350.000  350.000
0105020-001 AS         0.0    0.0    0.000    0.00    0.000    0.000
            CR6        NaN    NaN    1.900    1.90    1.900    1.900
            TCPR123    NaN    NaN      NaN     NaN    0.001    0.000
0110001-008 AS         2.0    2.0    2.000    2.00    2.000    2.000
            CR6        1.0    1.0    1.000    1.00    1.000    1.000
            TCPR123    NaN    NaN      NaN     NaN    0.005    0.005
            TDS      490.0  520.0  530.000  490.00  460.000  460.000
0110001-009 AS         2.0    2.0    2.000    2.00    2.000    2.000
            CR6        1.0    1.0    1.000    1.00    1.000    1.000
            TCPR123    NaN    NaN      NaN     NaN    0.005    0.005
            TDS      570.0  550.0  540.000  550.00  560.000  560.000
0110001-010 AS         2.0    2.0    2.000    2.00    2.000    2.000
            CR6        1.0    1.0    1.000    1.00    1.000    1.000
            TCPR123    NaN    NaN      NaN     NaN    0.005    0.005

Last reshape back by DataFrame.unstack and convert MultiIndex to column with map and join:

df3 = df1.unstack(1)
df3.columns = df3.columns.map('_'.join)

print (df3)
             2014_AS  2014_CR6  2014_TCPR123  2014_TDS  2015_AS  2015_CR6  \
s_wellid                                                                    
0103041-001      NaN       0.0           NaN       NaN      2.3       0.0   
0105020-001      0.0       NaN           NaN       NaN      0.0       NaN   
0110001-008      2.0       1.0           NaN     490.0      2.0       1.0   
0110001-009      2.0       1.0           NaN     570.0      2.0       1.0   
0110001-010      2.0       1.0           NaN       NaN      2.0       1.0   

             2015_TCPR123  2015_TDS  2016_AS  2016_CR6  ...  2017_TCPR123  \
s_wellid                                                ...                 
0103041-001           NaN     350.0    1.725       0.0  ...           NaN   
0105020-001           NaN       NaN    0.000       1.9  ...           NaN   
0110001-008           NaN     520.0    2.000       1.0  ...           NaN   
0110001-009           NaN     550.0    2.000       1.0  ...           NaN   
0110001-010           NaN       NaN    2.000       1.0  ...           NaN   

             2017_TDS  2018_AS  2018_CR6  2018_TCPR123  2018_TDS  2019_AS  \
s_wellid                                                                    
0103041-001     350.0    0.575       0.0           NaN     350.0      0.0   
0105020-001       NaN    0.000       1.9         0.001       NaN      0.0   
0110001-008     490.0    2.000       1.0         0.005     460.0      2.0   
0110001-009     550.0    2.000       1.0         0.005     560.0      2.0   
0110001-010       NaN    2.000       1.0         0.005       NaN      2.0   

             2019_CR6  2019_TCPR123  2019_TDS  
s_wellid                                       
0103041-001       0.0         0.000     350.0  
0105020-001       1.9         0.000       NaN  
0110001-008       1.0         0.005     460.0  
0110001-009       1.0         0.005     560.0  
0110001-010       1.0         0.005       NaN  

[5 rows x 24 columns]

Upvotes: 2

Related Questions