Zhang18
Zhang18

Reputation: 4930

Fastest way to set elements of Pandas Dataframe based on a function with index and column value as input

I have a single column Pandas dataframe:

s = 
      VALUE
INDEX
A     12
B     21
C     7
...
Y     21
Z     7

I want to make it into a square matrix mask with the same index and columns as s.index, with each element either True if the value of column and index are the same in s or False otherwise.

mask = 
      A     B     C ...      Y     Z 
A  True False False ...  False False
B False  True False ...   True False
C False False  True ...  False  True
...
Y False  True False ...   True False
Z False False  True ...  False  True

My actual s has 10K+ rows. What is the fastest way to generate that mask DataFrame?

One way I've tried is to create a 2-level dictionary with two for loops. (eg. dict['A']['B'] = dict['B']['A'] = True if s.loc['A'] == s.loc['B'] else False, etc.). Then convert bottom level of dict to a Pandas Series (eg. row = pd.Series(dict[A])), then append that series to mask. mask is constructed iteratively row by row.

That takes really long time, and has to explicitly loop through 10K x 10K / 2 = 50M elements... Not ideal?

Upvotes: 3

Views: 389

Answers (1)

piRSquared
piRSquared

Reputation: 294348

Use numpy broadcasting

v = s.VALUE.values
pd.DataFrame(v == v[:, None], s.index, s.index)

INDEX      A      B      C      Y      Z
INDEX                                   
A       True  False  False  False  False
B      False   True  False   True  False
C      False  False   True  False   True
Y      False   True  False   True  False
Z      False  False   True  False   True

Upvotes: 2

Related Questions