Reputation: 78723
I'm doing some matrix algebra using the very lovely pandas
library in Python. I'm really enjoying using the Series and Dataframe objects because of the ability to name rows and columns.
But is there a neat way to diagonalise a Series while maintaining row/column names?
Consider this minimum working example:
>>> import pandas as pd
>>> s = pd.Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])
>>> s
a 0.137477
b -0.606762
c 0.085030
d -0.571760
e -0.475104
dtype: float64
Now, I can do:
>>> import numpy as np
>>> np.diag(s)
array([[ 0.13747693, 0. , 0. , 0. , 0. ],
[ 0. , -0.60676226, 0. , 0. , 0. ],
[ 0. , 0. , 0.08502993, 0. , 0. ],
[ 0. , 0. , 0. , -0.57176048, 0. ],
[ 0. , 0. , 0. , 0. , -0.47510435]])
But I'd love to find a way of producing a Dataframe that looks like:
a b c d e
0 0.137477 0.000000 0.00000 0.00000 0.000000
1 0.000000 -0.606762 0.00000 0.00000 0.000000
2 0.000000 0.000000 0.08503 0.00000 0.000000
3 0.000000 0.000000 0.00000 -0.57176 0.000000
4 0.000000 0.000000 0.00000 0.00000 -0.475104
or perhaps even (which would be even better!):
a b c d e
a 0.137477 0.000000 0.00000 0.00000 0.000000
b 0.000000 -0.606762 0.00000 0.00000 0.000000
c 0.000000 0.000000 0.08503 0.00000 0.000000
d 0.000000 0.000000 0.00000 -0.57176 0.000000
e 0.000000 0.000000 0.00000 0.00000 -0.475104
This would be great because then I could do matrix operations like:
>>> S.dot(s)
a 0.018900
c 0.368160
b 0.007230
e 0.326910
d 0.225724
dtype: float64
and retain the names.
Many thanks in advance, as always. Rob
Upvotes: 5
Views: 4302
Reputation: 128948
How about this..
In [107]: pd.DataFrame(np.diag(s),index=s.index,columns=s.index)
Out[107]:
a b c d e
a 0.630529 0.000000 0.000000 0.000000 0.000000
b 0.000000 0.360884 0.000000 0.000000 0.000000
c 0.000000 0.000000 0.345719 0.000000 0.000000
d 0.000000 0.000000 0.000000 0.796625 0.000000
e 0.000000 0.000000 0.000000 0.000000 -0.176848
Upvotes: 7