walkingtomron
walkingtomron

Reputation: 71

How to select a value in a dataframe with MultiIndex?

I use the Panda library to analyze the data coming from an excel file. I used pivot_table to get a pivot table with the information I'm interested in. I end up with a multi index array. For "OPE-2016-0001", I would like to obtain the figures for 2017 for example. I've tried lots of things and nothing works. What is the correct method to use? thank you

import pandas as pd
import numpy as np
from math import *
import tkinter as tk
pd.set_option('display.expand_frame_repr', False)
df = pd.read_csv('datas.csv')


def tcd_op_dataExcercice():
global df
new_df = df.assign(Occurence=1)


tcd= new_df.pivot_table(index=['Numéro opération', 
                              'Libellé opération'],
                              columns=['Exercice'],
                              values=['Occurence'],
                              aggfunc=[np.sum],
                              margins=True,
                              fill_value=0,
                              margins_name='Total')
print(tcd)

print(tcd.xs('ALSTOM 8', level='Libellé opération', drop_level=False))

tcd_op_dataExcercice()

I get the following table (image). How do I get the value framed in red?

image

Upvotes: 0

Views: 1076

Answers (1)

harrisoneighty7
harrisoneighty7

Reputation: 11

You can use .loc to select rows by a DataFrame's Index's labels. If the Index is a MultiIndex, it will index into the first level of the MultiIndex (Numéro Opéracion in your case). Though you can pass a tuple to index into both levels (e.g. if you specifically wanted ("OPE-2016-0001", "ALSTOM 8"))

It's worth noting that the columns of your pivoted data are also a MultiIndex, because you specified the aggfunc, values and columns as lists, rather than individual values (i.e. without the []). Pandas creates a MultiIndex because of these lists, even though they had one argument.

So you'll also need to pass a tuple to index into the columns to get the value for 2017:

tcd.loc["OPE-2016-0001", ('sum', 'Occurence', 2017)]

If you had instead just specified the aggfunc etc as individual strings, the columns would just be the years and you could select the values by:

tcd.loc["OPE-2016-0001", 2017]

Or if you specifically wanted the value for ALSTOM 8:

tcd.loc[("OPE-2016-0001", "ALSTOM 8"), 2017]

An alternative to indexing into a MultiIndex would also be to just .reset_index() after pivoting -- in which case the levels of the MultiIndex will just become columns in the data. And you can then select rows based on the values of those columns. E.g (assuming you specified aggfunc etc as strings):

tcd =  tcd.reset_index()
tcd.query("'Numéro Opéracion' == 'OPE-2016-0001'")[2017]

Upvotes: 1

Related Questions