silver arrow
silver arrow

Reputation: 117

Simplest way to turn a covariance table into a covariance matrix in numpy

Say you load in a table of correlations that looks like this:

pd.DataFrame([['A', 'B', 1], ['B', 'C', 2], ['A', 'C', 3], ['C', 'D', 100]])

1 is the covariance between A and B, 2 is the covariance between B and C, etc.

What's the most elegant (readable + efficient) way to convert this to:

np.array([[1, 1, 3, 0], [1, 1, 2, 0], [3, 2, 1, 100], [0, 0, 100, 1]])

The full covariance matrix for A, B, C, and D, assuming that unwritten relationships are 0 (true in my case). Variables can be in any order.

Upvotes: 1

Views: 180

Answers (1)

Jorge Morgado
Jorge Morgado

Reputation: 1178

First, as I mention in a comment, having a ndarray with strign and numbers will convert all to strings. So you must convert back to float the values of correlation in the first table.

Assuming you have the first table like this:

correlations = np.array([[0, 1, 1], [1, 2, 2], [0, 2, 3], [2, 3, 100])

Where 'A' is 0, 'B' is 1, and so on...

you can create the matrix you need like this:

matrix = np.identity(var_counts)
for correlation in correlations:
    i, j, value = correlation
    i, j = int(i), int(j)
    matrix[i, j] = matrix[j, i] = value

Where var_counts is the number of variables you have in the first table.

Upvotes: 1

Related Questions