Reputation: 4930
I have a single column Pandas dataframe:
s =
VALUE
INDEX
A 12
B 21
C 7
...
Y 21
Z 7
I want to make it into a square matrix mask with the same index and columns as s.index
, with each element either True
if the value of column and index are the same in s
or False
otherwise.
mask =
A B C ... Y Z
A True False False ... False False
B False True False ... True False
C False False True ... False True
...
Y False True False ... True False
Z False False True ... False True
My actual s
has 10K+ rows. What is the fastest way to generate that mask
DataFrame?
One way I've tried is to create a 2-level dictionary with two for
loops. (eg. dict['A']['B'] = dict['B']['A'] = True if s.loc['A'] == s.loc['B'] else False
, etc.). Then convert bottom level of dict
to a Pandas Series (eg. row = pd.Series(dict[A])
), then append that series to mask
. mask
is constructed iteratively row by row.
That takes really long time, and has to explicitly loop through 10K x 10K / 2 = 50M elements... Not ideal?
Upvotes: 3
Views: 389
Reputation: 294348
Use numpy
broadcasting
v = s.VALUE.values
pd.DataFrame(v == v[:, None], s.index, s.index)
INDEX A B C Y Z
INDEX
A True False False False False
B False True False True False
C False False True False True
Y False True False True False
Z False False True False True
Upvotes: 2