Reputation: 117
Say you load in a table of correlations that looks like this:
pd.DataFrame([['A', 'B', 1], ['B', 'C', 2], ['A', 'C', 3], ['C', 'D', 100]])
1 is the covariance between A and B, 2 is the covariance between B and C, etc.
What's the most elegant (readable + efficient) way to convert this to:
np.array([[1, 1, 3, 0], [1, 1, 2, 0], [3, 2, 1, 100], [0, 0, 100, 1]])
The full covariance matrix for A, B, C, and D, assuming that unwritten relationships are 0 (true in my case). Variables can be in any order.
Upvotes: 1
Views: 180
Reputation: 1178
First, as I mention in a comment, having a ndarray
with strign and numbers will convert all to strings. So you must convert back to float the values of correlation in the first table.
Assuming you have the first table like this:
correlations = np.array([[0, 1, 1], [1, 2, 2], [0, 2, 3], [2, 3, 100])
Where 'A'
is 0, 'B'
is 1, and so on...
you can create the matrix you need like this:
matrix = np.identity(var_counts)
for correlation in correlations:
i, j, value = correlation
i, j = int(i), int(j)
matrix[i, j] = matrix[j, i] = value
Where var_counts
is the number of variables you have in the first table.
Upvotes: 1