jotasi
jotasi

Reputation: 5177

Index DataFrame with MultiIndex Rows and Columns via another DataFrame containing row and column indices as columns

I have a list of particle pairs within which each pair is referred to by a combination of a chain-index and an intra-chain-index of both particles. I have saved those in a Dataframe (let's call it index_array) and now I want to plot a matrix of all particle pairs, where I plot all matrix elements that correspond to a pair in the list in one color and all others in another color. My idea was thus to produce a DataFrame (let's call it to_fill) with chain- and intra-chain-index as a MultiIndex for both rows and columns, which thus has two entries per pair and then use index_array to index to_fill to change the corresponding values, such that I can then plot the values of to_fill via matplotlib.pyplot.pcolormesh.

So to break it down into a more or less well-defined problem: I have a boolean DataFrame to_fill that has multiindexed rows and columns (2 levels each) that contains only Falses. I also have another DataFrame index_array that has four columns, containing the index values for the levels of both rows and columns. Now I want to set all elements pointed to by index_array to True. A toy version of those could for example be produced with the code below:

import numpy as np
import pandas as pd
lengths = pd.Series(data=[2, 4], index=[1, 2])  # Corresponds to the chains' lengths
index = pd.MultiIndex.from_tuples([(i, j) for i in lengths.index
                                          for j in np.arange(1, lengths.loc[i]+1)])
to_fill = pd.DataFrame(index=index, columns=index, dtype=np.bool)
to_fill.loc[slice(None), slice(None)] = 0
print(to_fill)
#          1             2                     
#          1      2      1      2      3      4
# 1 1  False  False  False  False  False  False
#   2  False  False  False  False  False  False
# 2 1  False  False  False  False  False  False
#   2  False  False  False  False  False  False
#   3  False  False  False  False  False  False
#   4  False  False  False  False  False  False
index_array = pd.DataFrame([[1, 1, 1, 1],
                            [1, 1, 1, 2],
                            [2, 3, 2, 3],
                            [2, 3, 2, 4]],
                           columns=["i_1", "j_1", "i_2", "j_2"])
print(index_array)
#    i_1  j_1  i_2  j_2
# 0    1    1    1    1
# 1    1    1    1    2
# 2    2    3    2    3
# 3    2    3    2    4

Now I want to set all entries in to_fill that correspond to (i_1, j_1), (i_2, j_2) for a row in index_array to True. So basically, index_array refers to entries in to_fill that should be changed. The expected result would thus be:

print(to_fill)
#          1             2                     
#          1      2      1      2      3      4
# 1 1  True   True   False  False  False  False
#   2  False  False  False  False  False  False
# 2 1  False  False  False  False  False  False
#   2  False  False  False  False  False  False
#   3  False  False  False  False  True   True 
#   4  False  False  False  False  False  False

But I did not manage to properly use index_array as an index. How can I tell to_fill to treat the indexing arrays i_1, j_1, i_2, and j_2 as corresponding index values for the levels of the row and column MultiIndex respectively?

Upvotes: 0

Views: 127

Answers (2)

Dickster
Dickster

Reputation: 3009

This is a little better - hmm perhaps not really:

tuples = [tuple(x) for x in index_array.values]
stacked = to_fill.stack(level=0).stack() # double stack carefully ordered
stacked.loc[tuples] = True
result = stacked.unstack(level=2).unstack().dropna(axis=1) #unstack and drop NaN cols

Upvotes: 1

Dickster
Dickster

Reputation: 3009

This is not great as I don't seek to use iterrows() if it can be helped.

idx = pd.IndexSlice
for row in index_array.iterrows():
    r = row[1]
    i_1= r.loc['i_1']
    j_1= r.loc['j_1']
    i_2=  r.loc['i_2']
    j_2 = r.loc['j_2']

    to_fill.loc[idx[i_1,j_1],idx[i_2,j_2]] = True

Upvotes: 1

Related Questions