Reputation: 71
I use the Panda library to analyze the data coming from an excel file. I used pivot_table to get a pivot table with the information I'm interested in. I end up with a multi index array. For "OPE-2016-0001", I would like to obtain the figures for 2017 for example. I've tried lots of things and nothing works. What is the correct method to use? thank you
import pandas as pd
import numpy as np
from math import *
import tkinter as tk
pd.set_option('display.expand_frame_repr', False)
df = pd.read_csv('datas.csv')
def tcd_op_dataExcercice():
global df
new_df = df.assign(Occurence=1)
tcd= new_df.pivot_table(index=['Numéro opération',
'Libellé opération'],
columns=['Exercice'],
values=['Occurence'],
aggfunc=[np.sum],
margins=True,
fill_value=0,
margins_name='Total')
print(tcd)
print(tcd.xs('ALSTOM 8', level='Libellé opération', drop_level=False))
tcd_op_dataExcercice()
I get the following table (image). How do I get the value framed in red?
Upvotes: 0
Views: 1076
Reputation: 11
You can use .loc
to select rows by a DataFrame's Index's labels. If the Index is a MultiIndex, it will index into the first level of the MultiIndex (Numéro Opéracion
in your case). Though you can pass a tuple to index into both levels (e.g. if you specifically wanted ("OPE-2016-0001", "ALSTOM 8")
)
It's worth noting that the columns of your pivoted data are also a MultiIndex, because you specified the aggfunc
, values
and columns
as lists, rather than individual values (i.e. without the []
). Pandas creates a MultiIndex because of these lists, even though they had one
argument.
So you'll also need to pass a tuple to index into the columns to get the value for 2017:
tcd.loc["OPE-2016-0001", ('sum', 'Occurence', 2017)]
If you had instead just specified the aggfunc
etc as individual strings, the columns would just be the years and you could select the values by:
tcd.loc["OPE-2016-0001", 2017]
Or if you specifically wanted the value for ALSTOM 8
:
tcd.loc[("OPE-2016-0001", "ALSTOM 8"), 2017]
An alternative to indexing into a MultiIndex would also be to just .reset_index()
after pivoting -- in which case the levels of the MultiIndex will just become columns in the data. And you can then select rows based on the values of those columns. E.g (assuming you specified aggfunc
etc as strings):
tcd = tcd.reset_index()
tcd.query("'Numéro Opéracion' == 'OPE-2016-0001'")[2017]
Upvotes: 1