difference between 2 columns in Pandas Dataframe with multiindex

Question

I have a Pandas DataFrame whose columns are MultiIndex with 2 levels as follows:

index = ['monday','tuesday','wednesday']
tuples = [('yesterday','travel'),('yesterday','food'),('today','travel'),('today','food')]
columns = pd.MultiIndex.from_tuples(tuples,names=[None,'category'])
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(3, 4)), index=index, columns=columns)

i simply want to store the difference of each of the 'travel' and 'food' columns into a new top level column - e.g. 'diff' - next to 'yesterday' and 'travel'

diff = t['today'] - t['yesterday']

will return the underlying DataFrame that I am interested in but i cant figure out how to place it correctly within the overall DataFrame

something like:

pd.concat([df,diff],axis=1)

produces an interesting (but not correct) result

Ben.T · Accepted Answer

one way could be to make the columns of the diff as a MultiIndex such as:

diff = df['today'] - df['yesterday']
diff.columns = pd.MultiIndex.from_tuples([('diff',col) for col in diff.columns])

and then when you use concat, it gives:

print (pd.concat([df,diff],axis=1))
          yesterday       today        diff     
category     travel food travel food travel food
monday            8    7      7    1     -1   -6
tuesday           1    3      0    8     -1    5
wednesday         6    4      5    6     -1    2

Edit: Another way to not use MultiIndex could be to do directly the operation to create the columns:

df[[('diff','travel'),('diff','food')]] = df['today'] - df['yesterday']

and a more general way, you can use get_level_values

df[[('diff',col) for col in df.columns.get_level_values(1).unique()]] = df['today'] - df['yesterday']

difference between 2 columns in Pandas Dataframe with multiindex

Answers (1)

Related Questions