plotmaster473
plotmaster473

Reputation: 160

multi index with .loc on columns

I have a dataframe with multi index as follows

arrays = [
    ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
    ["one", "two", "one", "two", "one", "two", "one", "two"],
]
tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])

s = pd.DataFrame(np.random.randn(8), index=index).T

which looks like this

                    bar                      baz                   foo                      qux
          one       two          one         two         one       two          one         two
0   -0.144135   0.625481    -2.139184   -1.066893   -0.123791   -1.058165   0.495627    -0.654353

to which the documentation says to index in the following way

df.loc[:, (slice("bar", "two"), ...)]

and so I do

s.loc[:, (slice("bar", "two"):(slice("baz", "two"))]

which gives me a SyntaxError.

  Cell In[98], line 3
    s.loc[:, (slice("bar", "two"):(slice("baz", "two")))]
                                 ^
SyntaxError: invalid syntax

In my specific use-case [albeit beyond the scope of this question], the level 1 indices are of type timestamp [Year], but I figure the answer should be the same. What is the proper way to access a range of multi-indexed items via a multi-index column?

Upvotes: 3

Views: 88

Answers (2)

ouroboros1
ouroboros1

Reputation: 14414

As per the documentation, you have a few options to return this slice:

Option 1: hierarchical index using tuples (docs section)

(See also answer by @Koki.)

s.loc[:, ('bar', 'two'):('baz', 'two')]

Here we reference start (('bar', 'two')) and stop simply by tuples (('baz', 'two')) with the colon (:) in between to create a range between the specified columns.

Option 2: using slicers (docs section, cf. slice)

s.loc[:, slice(('bar', 'two'), ('baz', 'two'))]

The signature is slice(start, stop[, step]), so that ('bar', 'two') gets passed as start and ('baz', 'two') as stop.

Option 3: using pd.IndexSlice

idx = pd.IndexSlice
s.loc[:, idx['bar', 'two']:idx['baz', 'two']]

Similar to option 1: start + : + stop.


All three of these result in:

# using `np.random.seed(0)` for reproducibility

first        bar       baz          
second       two       one       two
0       0.400157  0.978738  2.240893

Upvotes: 2

Koki
Koki

Reputation: 149

If you want to get the data from bar two to baz two, the following code works.

s.loc[:, ("bar", "two"):("baz", "two")]

The result looks like this:

first            bar                      baz
second           two          one         two
     0      0.625481    -2.139184   -1.066893

Upvotes: 3

Related Questions