natalien
natalien

Reputation: 81

Calculating Covariance in Python

I was wondering if someone could give me tips on how to calculate covariance in Python; I do not want to use anything from numpy. I just want to learn how to do this manually and get practice with for loops.

Basically, I want to calculate the covariance of:

X = [1,2]
Y = [1,2,3]
P = [[0.25,0.25,0.0], [0.0, 0.25, 0.25]]

Mean of X: 1.5
Mean of Y: 2

These values are taken from: https://onlinecourses.science.psu.edu/stat414/node/109

The result of this should be 0.25.

I have been looping through X, Y, and P in nested for loops, but do not know other methods I can use to combine this.

I basically want to do this calculation:

(1-1.5)(1-2)(0.25) + (1-1.5)(2-2)(0.25) +  ..... + (2-1.5)(3-2)(0.25)

Upvotes: 1

Views: 3758

Answers (2)

Martin Evans
Martin Evans

Reputation: 46779

Python's product function in itertools can also help here, which can be combined with enumerate to return the required indexes for P as follows:

from itertools import product

X = [1, 2]
Y = [1, 2, 3]
P = [[0.25,0.25,0.0], [0.0, 0.25, 0.25]]

mean_x = float(sum(X) / len(X))
mean_y = float(sum(Y) / len(Y))

print sum((x[1] - mean_x) * (y[1] - mean_y) * P[x[0]][y[0]] for x, y in product(enumerate(X), enumerate(Y)))

Giving the result:

0.25

Upvotes: 1

Michael Recachinas
Michael Recachinas

Reputation: 2749

To calculate the covariance, you'll want something like the below, which has a nested loop, going through each list, and accumulates the covariance using the formula for covariance.

# let's get the mean of `X` (add all the vals in `X` and divide by
# the length
x_mean = float(sum(X)) / len(X)

# now, let's get the mean for `Y`
y_mean = float(sum(Y)) / len(Y)

# initialize the covariance to 0 so we can add it up
cov = 0

# we'll use a nested loop structure -- the outer loop can be through `Y`
# or `X`, it doesn't matter in this case
# we'll use python's `enumerate`, which lets us iterate through the `list`
# using a `tuple` that contains (the_current_index, the_current_element),
# or in `C`/`Java` terms, `(i, arr[i])`
for y_idx,y in enumerate(Y):
    for x_idx,x in enumerate(X):

        # the covariance is defined by the following equation
        # you don't need to loop through `P` -- the outer list
        # contains 2 elements, which is the size of `X`, and
        # the inner list contains 3 elements, which is the size of `Y`
        cov += (x - x_mean) * (y - y_mean) * P[x_idx][y_idx]

print cov # => 0.25

Upvotes: 3

Related Questions