Reputation: 3852
The purpose of this post is to understand how to add a column to a level in a MultiIndex.DataFrame
using apply()
and shift()
Create the DataFrame
import pandas as pd
df = pd.DataFrame(
[
[5777, 100, 5385, 200, 5419, 4887, 100, 200],
[4849, 0, 4539, 0, 3381, 0, 0, ],
[4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
[4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
[5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
],
index=pd.to_datetime(['2010-01-01',
'2010-01-02',
'2010-01-03',
'2010-01-04',
'2010-01-05']),
columns=pd.MultiIndex.from_tuples(
[('Portfolio A', 'GBP', 'amount'), ('Portfolio A', 'GBP', 'injection'),
('Portfolio B', 'EUR', 'amount'), ('Portfolio B', 'EUR', 'injection'),
('Portfolio A', 'USD', 'amount'), ('Portfolio A', 'USD', 'injection'),
('Portfolio B', 'JPY', 'amount'), ('Portfolio B', 'JPY', 'injection')])
).sortlevel(axis=1)
print df
I would like to use the following method to add a new column to each currency at level 2 named daily_added_value:
def do_nothing(group):
return group
def calc_daily_added_value(group):
g = (group['amount'] - group['amount'].shift(periods=1, freq=None, axis=0)
-df['injection'].shift(periods=1, freq=None, axis=0)).round(decimals=2)
g.index = ['daily_added_value']
return g
pd.concat([df.T.groupby(level=0).apply(f).T for f in [calc_daily_added_value,do_nothing ]], axis=1).sort_index(axis=1)
However this throws a key error: KeyError: 'amount'
What is the correct syntax for the method calc_daily_added_value()
?
Following on from the answer below there is still an issue
Adding the daily return works
dav = df.loc[:, pd.IndexSlice[:, :, 'daily_added_value']]
amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
dr = (dav.values / amount.shift()) * 100
dr.columns.set_levels(['daily_return'], level=2, inplace=True)
df = pd.concat([df, dr], axis=1).sortlevel(axis=1)
Adding the cumulative compounded returns FAILS
dr = df.loc[:, pd.IndexSlice[:, :, 'daily_return']]
drc = 100*((1+dr / 100).cumprod()-1)
drc.columns.set_levels(['daily_return_cumulative'], level=2, inplace=True)
df = pd.concat([df, drc], axis=1).sort_index(axis=1)
df.head()
this fails because it is missing the .values, but if I add this it becomes an array?
What is strange here though is that drc is in fact a DataFrame of correct shaped etc. and appears to contain correct results.
This fails on this line:
drc.columns.set_levels(['daily_return_cumulative'], level=2, inplace=True)
Error is ValueError: On level 2, label max (2) >= length of level (1). NOTE: this index is in an inconsistent state
How can the index be placed back into a consistent state?
Upvotes: 3
Views: 1721
Reputation: 294218
Skip the groupby
it is not necessary
amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
inject = df.loc[:, pd.IndexSlice[:, :, 'injection']]
dav = amount - amount.shift() - inject.shift().values
#dav.columns.set_levels(['daily_added_value'], level=2, inplace=True)
pd.concat([df, dav], axis=1).sort_index(axis=1).T
T
to get a picture that would easily fitthere appears to be a bug in set_levels
and as such it is not advised to use it.
Workaround to rename the MultiIndex Column in the DataFrame dav
def map_level(df, dct, level=2):
index = df.index
index.set_levels([[dct.get(item, item) for item in names] if i==level else names
for i, names in enumerate(index.levels)], inplace=True)
dct = {'amount':'daily_added_value'}
map_level(dav.T, dct, level=2)
Upvotes: 2