Reputation: 351
I would like to find the difference between columns in Multi-index, I have three dimensions, the Family, the Date, and the Client, in the goal is to have new columns with the difference by rows with Client,Date and Family in Multi-index.
import pandas as pd
import numpy as np
data = {
'Family':{
0: 'Hugo',
1: 'Hugo',
2: 'Hugo',
3: 'Hugo'},
'Date': {
0: '2021-04-15',
1: '2021-04-16',
2: '2021-04-15',
3: '2021-04-16'},
'Client': {
0: 1,
1: 1,
2: 2,
3: 2},
'Code_Client': {
0: 605478.0,
1: 605478.0,
2: 605478.0,
3: 605478.0},
'Price': {
0: 2.23354416539888,
1: 2.0872536032616744,
2: 1.8426286431701764,
3: 0.3225935619590472}
}
df = pd.DataFrame(data)
pd.pivot_table(pd.DataFrame(data), values='Price', index=['Code_Client'],columns=
['Family','Date', 'Client'])
Do you have any idea?
Thank you,
Upvotes: 0
Views: 427
Reputation: 11502
I assume that you are looking for the difference for Price grouped by the Family
and Date
and Client
. Your formulation of the problem was somewhat unclear and you didn't post an expected output. I changed your dataframe slightly to add a family to make the solution more visible.
data = {
'Family':{
0: 'Hugo',
1: 'Hugo',
2: 'Victor',
3: 'Victor'},
'Date': {
0: '2021-04-15',
1: '2021-04-16',
2: '2021-04-15',
3: '2021-04-16'},
'Client': {
0: 1,
1: 1,
2: 2,
3: 2},
'Code_Client': {
0: 605478.0,
1: 605478.0,
2: 605478.0,
3: 605478.0},
'Price': {
0: 2.23354416539888,
1: 2.0872536032616744,
2: 1.8426286431701764,
3: 0.3225935619590472}
}
df = pd.DataFrame(data)
pd.pivot_table(pd.DataFrame(data), values='Price', index=['Code_Client'],columns=
['Family','Date', 'Client'])
As you can see, I added the Victor family. So, you dataframe looks like this:
Family Date Client Code_Client Price
0 Hugo 2021-04-15 1 605478.0 2.233544
1 Hugo 2021-04-16 1 605478.0 2.087254
2 Victor 2021-04-15 2 605478.0 1.842629
3 Victor 2021-04-16 2 605478.0 0.322594
To add a column of differences by groups, I suggest you do the following:
df = df.set_index(['Family', 'Date','Client']).sort_index()[['Price']]
df['diff'] = np.nan
idx = pd.IndexSlice
for ix in df.index.levels[0]:
df.loc[ idx[ix,:], 'diff'] = df.loc[idx[ix,:], 'Price' ].diff()
The first step indexes you variables (the ones you want to group by) and create an empty (or filled with nan
) column of difference. The second step populates it by the differences between rows, by groups.
This returns:
Price diff
Family Date Client
Hugo 2021-04-15 1 2.233544 NaN
2021-04-16 1 2.087254 -0.146291
Victor 2021-04-15 2 1.842629 NaN
2021-04-16 2 0.322594 -1.520035
If you are unhappy about the nan
, do this:
df = df.set_index(['Family', 'Date','Client']).sort_index()[['Price']]
df['diff'] = np.nan
idx = pd.IndexSlice
for ix in df.index.levels[0]:
df.loc[ idx[ix,:], 'diff'] = df.loc[idx[ix,:], 'Price' ].diff().fillna(0)
I added .fillna(0)
to the diff()
statement. It returns:
Price diff
Family Date Client
Hugo 2021-04-15 1 2.233544 0.000000
2021-04-16 1 2.087254 -0.146291
Victor 2021-04-15 2 1.842629 0.000000
2021-04-16 2 0.322594 -1.520035
Upvotes: 1