How to read multi index dataframe in python

Question

Here is my dataframe which called df

University  Subject  Colour
Melb        Math     Red
            English  Blue
Sydney      Math     Green
            Arts     Yellow
            English  Green
Ottawa      Med      Blue
            Math     Yellow

Both University and Subject are the index key for this dataframe

when I do this

print(df.to_dict('index'))

I get

{(Melb, Math): {'Colour': Red}, (Melb, English): {'Colour': Blue}, ...

When I do this

print(df["Colour"])

I get this

University  Subject  Colour
Melb        Math     Red
            English  Blue
Sydney      Math     Green
            Arts     Yellow
            English  Green
Ottawa      Med      Blue
            Math     Yellow

When I do

print(df["University"])

I get an error

KeyError: 'University'

What I want is a way to read each value separately

I want to read the University and another read for Subject and a third for the Colour

How to do that?

Jay Shukla · Accepted Answer

A quicker way to do this is by using python's zip function, this method will be significantly faster than manually running a for loop.

Quick-Answer to your question:

university_list = list(zip(*df.index))[0]
subject_list = list(zip(*df.index))[1]
colour_list = list(df['Colour'])

Explaination

To get Indexes as List:

index_list = list(zip(*df.index))

Output:

[('Melb','Sydney','Ottawa'),('Math','English','Math','Arts',...)]

You will get a list of tuples where each tuples will be relating to an index column.

(columns will be in Left to Right order: such as 1st index-column will be the first tuple, 2nd index-column will be the second tuple and so on!)

Now, to get the Separate Index Column Lists you can simply do,

Universities = list(index_list[0]) #this will give you separate list for university ('Melb','Sydney','Ottawa')
Subjects = list(index_list[1]) #this will give you separate list for Subjects ('Math','English','Math','Arts',...)

Getting data as a list from Non-Index Columns

You can do this by simply doing,

column_data = list(df['column_name'])

#which in your case will be

colour_list = list(df['Colour'])

I am extending the answer to answer one of the comments.

Now, Imagine a case where you need the whole Dataframe as a list of Tuples where each tuple will have data of a column. (Index columns included)

The list will look something like,

[(Col-1_data, ,...),(Col-2_data, ,...),...]

To achieve something like this you will have to reset the indexes, Fetch the data and set indexes again. Below code will do the task,

index_names = list(df.index.names) #saving current indexes so that we can reassign them later.
df.reset_index(inplace = True)
dataframe_raw_list = df.values.tolist() #This will be a list of tuples where each tuple is a row of dataframe
df.set_index(index_names, inplace = True)

dataframe_columns_list = list(zip(*dataframe_raw_list)) #This will be a list of tuples where each tuple is a Column of dataframe

Output: