Reputation: 4954
Lets say I have a dataframe, df
with 10 columns, and several hundred rows. Those columns are labeled A, B, C, ...
Further, I have a pandas Series, s
containing data that is the same several hundred rows in length.
What I would like to do is get a DataFrame that contains the covariance of each of my rows in df
with the series s
. Something like:
cov_s
A 0.003
B 0.0089
C 0.0032
...
J 0.0192
I would like to avoid adding the s
as a column of df
and doing df.cov()
and taking the one column under the added s
, as my data sets are likely to get quite large, and doing a full covariance matrix may have some convergence issues (whereas doing just a 2 series cov wont have this issue). Any ideas on how to accomplish this?
Upvotes: 2
Views: 2216
Reputation: 13437
You can use apply
to get the covariance of s with each column fairly easily.
Set up data:
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.rand(20, 5), columns=list("ABCDE"))
s = pd.Series(np.random.rand(20))
print(df.head())
print()
print(s.head())
A B C D E
0 0.548814 0.715189 0.602763 0.544883 0.423655
1 0.645894 0.437587 0.891773 0.963663 0.383442
2 0.791725 0.528895 0.568045 0.925597 0.071036
3 0.087129 0.020218 0.832620 0.778157 0.870012
4 0.978618 0.799159 0.461479 0.780529 0.118274
0 0.677817
1 0.270008
2 0.735194
3 0.962189
4 0.248753
dtype: float64
Using apply to get the covariance:
df.apply(lambda column: s.cov(column))
A -0.011373
B -0.017225
C -0.014311
D 0.004783
E 0.015021
dtype: float64
Upvotes: 3