In pandas, how can I calculate the covariance of each column with a series?

Question

Lets say I have a dataframe, df with 10 columns, and several hundred rows. Those columns are labeled A, B, C, ... Further, I have a pandas Series, s containing data that is the same several hundred rows in length.

What I would like to do is get a DataFrame that contains the covariance of each of my rows in df with the series s. Something like:

       cov_s
    A  0.003
    B  0.0089
    C  0.0032
    ...
    J  0.0192

I would like to avoid adding the s as a column of df and doing df.cov() and taking the one column under the added s, as my data sets are likely to get quite large, and doing a full covariance matrix may have some convergence issues (whereas doing just a 2 series cov wont have this issue). Any ideas on how to accomplish this?

Cameron Riddell · Accepted Answer

You can use apply to get the covariance of s with each column fairly easily.

Set up data:

import pandas as pd
import numpy as np
np.random.seed(0)

df = pd.DataFrame(np.random.rand(20, 5), columns=list("ABCDE"))
s = pd.Series(np.random.rand(20))

print(df.head())
print()
print(s.head())
          A         B         C         D         E
0  0.548814  0.715189  0.602763  0.544883  0.423655
1  0.645894  0.437587  0.891773  0.963663  0.383442
2  0.791725  0.528895  0.568045  0.925597  0.071036
3  0.087129  0.020218  0.832620  0.778157  0.870012
4  0.978618  0.799159  0.461479  0.780529  0.118274

0    0.677817
1    0.270008
2    0.735194
3    0.962189
4    0.248753
dtype: float64

Using apply to get the covariance:

df.apply(lambda column: s.cov(column))
A   -0.011373
B   -0.017225
C   -0.014311
D    0.004783
E    0.015021
dtype: float64

In pandas, how can I calculate the covariance of each column with a series?

Answers (1)

Related Questions