Reputation: 10060
I have a pandas
dataframe with 9 time series variables: AS
, CR6
,TCPR
, .... I need to interpolate values in the dataframe, though the final form of the dataframe needs to stay in the pivoted form for subsequent analysis. The tricky thing is that the dataframe is pivoted where the year is in the column name.
AS_2014 AS_2015 AS_2016 AS_2017 AS_2018 AS_2019 CR6_2014 CR6_2015 CR6_2016 CR6_2017 ... TCPR123_2016 TCPR123_2017 TCPR123_2018 TCPR123_2019 TDS_2014 TDS_2015 TDS_2016 TDS_2017 TDS_2018 TDS_2019
s_wellid
0103041-001 NaN 2.3 NaN NaN NaN 0.0 0.0 NaN NaN NaN ... NaN NaN NaN 0.000 NaN 350.0 NaN NaN NaN 350.0
0105020-001 0.0 NaN 0.0 NaN NaN NaN NaN NaN 1.9 NaN ... NaN NaN 0.001 0.000 NaN NaN NaN NaN NaN NaN
0110001-008 2.0 2.0 2.0 2.0 2.0 NaN 1.0 1.0 1.0 1.0 ... NaN NaN 0.005 0.005 490.0 520.0 530.0 490.0 460.0 NaN
0110001-009 2.0 2.0 2.0 2.0 2.0 NaN 1.0 1.0 1.0 1.0 ... NaN NaN 0.005 NaN 570.0 550.0 540.0 550.0 560.0 NaN
0110001-010 2.0 2.0 2.0 2.0 2.0 NaN 1.0 1.0 1.0 1.0 ... NaN NaN 0.005
I need to basically group all of the columns by the prefix like AS_*
or CR6_*
, and then interpolate between the years.
I have tried unpivoting the data with the melt()
command, but that creates all kinds of problems in trying to re-pivot the final results. Does someone know how to get started on this kind of problem.
Upvotes: 0
Views: 431
Reputation: 862901
First use str.split
by columns for MultiIndex
, so possible reshape by DataFrame.stack
with first level, then use DataFrame.droplevel
:
df.columns = df.columns.str.split('_', expand=True)
df1 = df.stack(0).interpolate(axis=1)
print (df1)
2014 2015 2016 2017 2018 2019
s_wellid
0103041-001 AS NaN 2.3 1.725 1.15 0.575 0.000
CR6 0.0 0.0 0.000 0.00 0.000 0.000
TCPR123 NaN NaN NaN NaN NaN 0.000
TDS NaN 350.0 350.000 350.00 350.000 350.000
0105020-001 AS 0.0 0.0 0.000 0.00 0.000 0.000
CR6 NaN NaN 1.900 1.90 1.900 1.900
TCPR123 NaN NaN NaN NaN 0.001 0.000
0110001-008 AS 2.0 2.0 2.000 2.00 2.000 2.000
CR6 1.0 1.0 1.000 1.00 1.000 1.000
TCPR123 NaN NaN NaN NaN 0.005 0.005
TDS 490.0 520.0 530.000 490.00 460.000 460.000
0110001-009 AS 2.0 2.0 2.000 2.00 2.000 2.000
CR6 1.0 1.0 1.000 1.00 1.000 1.000
TCPR123 NaN NaN NaN NaN 0.005 0.005
TDS 570.0 550.0 540.000 550.00 560.000 560.000
0110001-010 AS 2.0 2.0 2.000 2.00 2.000 2.000
CR6 1.0 1.0 1.000 1.00 1.000 1.000
TCPR123 NaN NaN NaN NaN 0.005 0.005
Last reshape back by DataFrame.unstack
and convert MultiIndex
to column with map
and join
:
df3 = df1.unstack(1)
df3.columns = df3.columns.map('_'.join)
print (df3)
2014_AS 2014_CR6 2014_TCPR123 2014_TDS 2015_AS 2015_CR6 \
s_wellid
0103041-001 NaN 0.0 NaN NaN 2.3 0.0
0105020-001 0.0 NaN NaN NaN 0.0 NaN
0110001-008 2.0 1.0 NaN 490.0 2.0 1.0
0110001-009 2.0 1.0 NaN 570.0 2.0 1.0
0110001-010 2.0 1.0 NaN NaN 2.0 1.0
2015_TCPR123 2015_TDS 2016_AS 2016_CR6 ... 2017_TCPR123 \
s_wellid ...
0103041-001 NaN 350.0 1.725 0.0 ... NaN
0105020-001 NaN NaN 0.000 1.9 ... NaN
0110001-008 NaN 520.0 2.000 1.0 ... NaN
0110001-009 NaN 550.0 2.000 1.0 ... NaN
0110001-010 NaN NaN 2.000 1.0 ... NaN
2017_TDS 2018_AS 2018_CR6 2018_TCPR123 2018_TDS 2019_AS \
s_wellid
0103041-001 350.0 0.575 0.0 NaN 350.0 0.0
0105020-001 NaN 0.000 1.9 0.001 NaN 0.0
0110001-008 490.0 2.000 1.0 0.005 460.0 2.0
0110001-009 550.0 2.000 1.0 0.005 560.0 2.0
0110001-010 NaN 2.000 1.0 0.005 NaN 2.0
2019_CR6 2019_TCPR123 2019_TDS
s_wellid
0103041-001 0.0 0.000 350.0
0105020-001 1.9 0.000 NaN
0110001-008 1.0 0.005 460.0
0110001-009 1.0 0.005 560.0
0110001-010 1.0 0.005 NaN
[5 rows x 24 columns]
Upvotes: 2