Reputation: 119
I would like to add a new column between each of the date columns, with the interpolated value using the previous and next value to get a month value.
data = [['Jane', 10,11,45,66,21], ['John',11,55,34,44,22],['Tom',23,43,12,11,44]]
df = pd.DataFrame(data, columns = ['Name', '09-Aug', '02-Sep','18-Oct','02-Nov','14-Dec'])
This returns the following:
In between each column after the first one, I would like to add one which contains the month preceding it, and the interpolated value based on the preceding and next column.
So eg:
I tried to first add a column between each one using the following code:
N = len(df.columns) #
for i in range(0,N): #
df.insert(i,'','',allow_duplicates=True)
But this only adds columns to the left of the table, not between each one. Once I had added the columns, I was thinking of writing a function to perform the linear interpolation.
Does anyone have a suggestion on the correct way around to tackle this?
Upvotes: 1
Views: 1360
Reputation: 386
from numpy import nan
import pandas
data = [['Jane', 10,11,45,66,21], ['John',11,55,34,44,22],['Tom',23,43,12,11,44]]
df = pd.DataFrame(data, columns = ['Name', '09-Aug', '02-Sep','18-Oct','02-Nov','14-Dec'])
df_c = df.drop('Name', axis=1)
for i in range(1, len(df.columns) + len(df.columns)-3, 2):
col_title = df_c.iloc[:, i-1].name[3:]
df_c.insert(i, col_title, pd.Series([nan] * len(df.index)))
df[['Name']].join(df_c.interpolate(axis=1))
Upvotes: 1
Reputation: 13242
# Make Name your index, and Transpose:
df = df.set_index('Name').T
# Convert index to datetime:
df.index = pd.to_datetime(df.index, format='%d-%b')
# Create new values for each month:
new_index_vals = pd.date_range(df.index.min(), df.index.max(), freq='MS')
# Reindex, including these new values:
df = df.reindex(df.index.union(new_index_vals))
# Apply interpolation, accounting for time:
df = df.interpolate('time') # You can also choose just `linear` here~
# Convert back to original format, formatting month_starts differently:
df.index = np.where(df.index.is_month_start,
df.index.strftime('%B'),
df.index.strftime('%d-%b'))
# Transpose back to original format:
df = df.T.reset_index()
print(df.round(2))
Output:
Name 09-Aug September 02-Sep October 18-Oct November 02-Nov December 14-Dec
0 Jane 10.0 10.96 11.0 32.43 45.0 64.60 66.0 34.93 21.0
1 John 11.0 53.17 55.0 41.76 34.0 43.33 44.0 28.81 22.0
2 Tom 23.0 42.17 43.0 23.46 12.0 11.07 11.0 33.79 44.0
Upvotes: 0