user2153235
user2153235

Reputation: 1115

Is a pandas MultiIndex a counterpart to a SQL composite index?

I posted this on reddit some days ago, but haven't received any response.

Everything I've read online about the pandas MultiIndex makes it seem like a counterpart to a SQL composite index. Is this the correct understanding?

Additionally, MultiIndex is often described as hierarchical. This disrupts the analogy with a composite index. To me, that means a tree structure, with parent keys and child keys, possibly with a depth greater than 2. A composite index doesn't fit this picture. In the case of MultiIndexes, what are the parent/child keys?

Upvotes: 1

Views: 47

Answers (1)

Parman M. Alizadeh
Parman M. Alizadeh

Reputation: 1553

I think you almost answered your question. The key difference between a SQL composite index and a MultiIndex in pandas is the hierarchy with lower-level indexes having more priority.

Consider the following example:

import pandas as pd
import numpy as np

multi_index = pd.MultiIndex.from_product([['East', 'West'], ['Retail', 'Corporate'], ['Shirts', 'Pants', 'Accessories']],
                                         names=['Region', 'Customer', 'Product'])

df = pd.DataFrame(np.random.randn(12, 2), index=multi_index, columns=['Revenue', 'Profit'])
print(df)

Output

                                  Revenue     Profit
Region Customer Product                               
East   Retail   Shirts            0.469112 -0.282863
                Pants            -1.509059 -1.135632
                Accessories      -1.135632 -0.173215
      Corporate Shirts            0.119209 -1.044236
                Pants            -0.861849 -2.104569
                Accessories      -0.721189  0.265599
West  Retail    Shirts           -0.315196  0.854109
                Pants             0.572192 -0.273712
                Accessories       0.228440  0.574273
      Corporate Shirts           -0.284819 -0.671225
                Pants             0.146182  0.056439
                Accessories       0.423380  0.539674

In this example, we have three levels in the MultiIndex:

  1. The parent level is 'Region' with keys 'East' and 'West'.
  2. The second level is 'Customer' with keys 'Retail' and 'Corporate', nested under each 'Region'.
  3. The child level is 'Product' with keys 'Shirts', 'Pants', and 'Accessories', nested under each combination of 'Region' and 'Customer'.

Upvotes: 1

Related Questions