Eagle
Eagle

Reputation: 3474

Using pandas dataframe with Scipy

Assuming a dataframe, df, using pandas in the size of n x m.

I would like to perfom linear algebra operation on df.

Until now, I was unable to find a way to perfom directly linear algebra on df. What i was able to find is how to convert df from pandas format to Numpy using:

A = DataFrame.as_matrix

then I can simpy do

linalg.inv(A)

Is there a direct way of performing linear operation in Scipy using pandas dataframe? for example:

linalg.inv(df)

The reason I would like to use the linear algebra operation from scipy instead of Numpy is based on:

In any case, SciPy contains more fully-featured versions of the linear algebra modules, as well as many other numerical algorithms. If you are doing scientific computing with python, you should probably install both NumPy and SciPy. Most new features belong in SciPy rather than NumPy.

from What-is-the-difference-between-NumPy-and-SciPy

Upvotes: 2

Views: 6367

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

You can directly use it on your DataFrames.

Demo:

In [111]: from scipy.linalg import inv

In [112]: df = pd.DataFrame(np.random.rand(5,5), columns=list('abcde'))

In [113]: df
Out[113]:
          a         b         c         d         e
0  0.619086  0.229390  0.361611  0.857177  0.274983
1  0.389630  0.689562  0.687043  0.388781  0.781168
2  0.702920  0.253870  0.881173  0.858378  0.363035
3  0.007022  0.571111  0.408729  0.708862  0.042882
4  0.876747  0.170775  0.499824  0.929295  0.762971

In [114]: inv(df)
Out[114]:
array([[ 5.67652746,  1.54854922, -0.21927114, -3.04884324, -3.35567433],
       [ 4.32996215,  1.99787442, -1.18579234, -0.9802008 , -2.98677673],
       [-2.43833426, -0.29287732,  2.11691208,  0.34655505,  0.1519223 ],
       [-1.92398165, -1.43903773, -0.22722582,  1.96404685,  2.16451337],
       [-3.55144126, -0.28205091, -0.59264783,  1.10366465,  3.09938364]])

PS i used Pandas 0.19.2 and SciPy 0.18.1 for this demo.

UPDATE: if you want to get a DataFrame as a result:

In [4]: pd.DataFrame(inv(df), columns=df.columns, index=df.index)
Out[4]:
          a         b         c         d         e
0  5.676507  1.548541 -0.219275 -3.048828 -3.355657
1  4.329938  1.997865 -1.185791 -0.980187 -2.986760
2 -2.438323 -0.292872  2.116913  0.346547  0.151914
3 -1.923971 -1.439034 -0.227226  1.964040  2.164506
4 -3.551428 -0.282045 -0.592647  1.103655  3.099373

Upvotes: 2

Related Questions