Reputation: 495
I have a pandas dataframe with column of year month data(yyyymm). I am planning on interpolate data to daily & weekly values. Here is my df below.
df:
201301 201302 201303 ... 201709 201710
a 0.747711 0.793101 0.771819 ... 0.818161 0.812522
b 0.776537 0.759745 0.733673 ... 0.757496 0.765181
c 0.801699 0.847655 0.796586 ... 0.784537 0.763551
d 0.797942 0.687899 0.729911 ... 0.819887 0.772395
e 0.777472 0.799676 0.782947 ... 0.804533 0.791759
f 0.780933 0.750774 0.781056 ... 0.790846 0.773705
g 2.071699 2.261739 2.126915 ... 1.891780 2.098914
As you can see that my df is in montly column data and I am hoping to change this to daily values. I am planning on using linear functions. Here is example.
# (201302 - 201301)/31 (since January 2013 has 31 days)
a = (0.793101-0.747711)/31
# now a is the daily increasing (or decresing depends on values) value for a day.
# 2013-01-01 value woud be
0.747711
# 2013-01-02 value woud be
0.747711 + a
# 2013-01-03 value woud be
0.747711 + (a*2)
# last day of January would be
0.747711 + (a*30)
# first day of Feb would be
0.747711 + (a*31) which is 0.793101 (201302 value)
So my df_daily would have every day from 2013 to 2017 Oct(first day) and the values would be just like above. I am very week on working with timestamps so it would be great if there is any way to interpolate my value from month to day values. Thanks!
Oh please let me know if my question is confusing...
Upvotes: 2
Views: 4248
Reputation: 863166
First convert columns to datetimes
by to_datetime
, then reindex
for NaN
s for missing days and last interpolate
:
df.columns = pd.to_datetime(df.columns, format='%Y%m')
#by first and last values of columns
rng = pd.date_range(df.columns[0], df.columns[-1])
#alternatively min and max of columns
#rng = pd.date_range(df.columns.min(), df.columns.max())
df = df.reindex(rng, axis=1).interpolate(axis=1)
Verify solution:
a = (0.793101-0.747711)/31
print (0.747711 + a)
print (0.747711 + a*2)
print (0.747711 + a*3)
0.7491751935483871
0.7506393870967742
0.7521035806451613
print (df)
2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06 \
a 0.747711 0.749175 0.750639 0.752104 0.753568 0.755032
b 0.776537 0.775995 0.775454 0.774912 0.774370 0.773829
c 0.801699 0.803181 0.804664 0.806146 0.807629 0.809111
d 0.797942 0.794392 0.790842 0.787293 0.783743 0.780193
e 0.777472 0.778188 0.778905 0.779621 0.780337 0.781053
f 0.780933 0.779960 0.778987 0.778014 0.777042 0.776069
g 2.071699 2.077829 2.083960 2.090090 2.096220 2.102351
2013-01-07 2013-01-08 2013-01-09 2013-01-10 ... 2017-09-22 \
a 0.756496 0.757960 0.759425 0.760889 ... 0.814214
b 0.773287 0.772745 0.772204 0.771662 ... 0.762876
c 0.810594 0.812076 0.813559 0.815041 ... 0.769847
d 0.776643 0.773094 0.769544 0.765994 ... 0.786643
e 0.781770 0.782486 0.783202 0.783918 ... 0.795591
f 0.775096 0.774123 0.773150 0.772177 ... 0.778847
g 2.108481 2.114611 2.120742 2.126872 ... 2.036774
2017-09-23 2017-09-24 2017-09-25 2017-09-26 2017-09-27 2017-09-28 \
a 0.814026 0.813838 0.813650 0.813462 0.813274 0.813086
b 0.763132 0.763388 0.763644 0.763900 0.764156 0.764413
c 0.769147 0.768448 0.767748 0.767049 0.766349 0.765650
d 0.785060 0.783476 0.781893 0.780310 0.778727 0.777144
e 0.795165 0.794740 0.794314 0.793888 0.793462 0.793036
f 0.778276 0.777705 0.777133 0.776562 0.775990 0.775419
g 2.043678 2.050583 2.057487 2.064392 2.071296 2.078201
2017-09-29 2017-09-30 2017-10-01
a 0.812898 0.812710 0.812522
b 0.764669 0.764925 0.765181
c 0.764950 0.764251 0.763551
d 0.775561 0.773978 0.772395
e 0.792611 0.792185 0.791759
f 0.774848 0.774276 0.773705
g 2.085105 2.092010 2.098914
[7 rows x 1735 columns]
Upvotes: 4