Reputation: 195
This is my dataframe:
id age gender height weight ap_hi \
id 1.000000 0.002623 0.003799 0.000221 0.000144 0.003489
age 0.002623 1.000000 -0.018274 -0.077426 0.069705 0.018482
gender 0.003799 -0.018274 1.000000 0.504722 0.130116 0.004941
height 0.000221 -0.077426 0.504722 1.000000 0.248868 0.004300
weight 0.000144 0.069705 0.130116 0.248868 1.000000 0.026527
ap_hi 0.003489 0.018482 0.004941 0.004300 0.026527 1.000000
ap_lo 0.000429 0.152787 0.059500 0.015356 0.223786 0.072260
cholesterol 0.003867 0.129582 -0.037669 -0.064477 0.132686 0.022606
gluc 0.002477 0.087280 -0.021178 -0.031410 0.104475 0.011004
smoke -0.002403 -0.044208 0.337682 0.187389 0.055805 -0.001978
alco -0.001039 -0.026956 0.169178 0.089257 0.058286 0.000607
active 0.005890 -0.011471 0.007702 -0.005042 -0.012112 -0.000162
cardio 0.003770 0.239987 0.001727 -0.025673 0.166886 0.050321
overweight -0.000769 0.089282 -0.055146 -0.156139 0.655764 0.016900
ap_lo cholesterol gluc smoke alco active \
id 0.000429 0.003867 0.002477 -0.002403 -0.001039 0.005890
age 0.152787 0.129582 0.087280 -0.044208 -0.026956 -0.011471
gender 0.059500 -0.037669 -0.021178 0.337682 0.169178 0.007702
height 0.015356 -0.064477 -0.031410 0.187389 0.089257 -0.005042
weight 0.223786 0.132686 0.104475 0.055805 0.058286 -0.012112
ap_hi 0.072260 0.022606 0.011004 -0.001978 0.000607 -0.000162
ap_lo 1.000000 0.148701 0.073920 0.022997 0.031839 0.002184
cholesterol 0.148701 1.000000 0.383601 0.012797 0.037588 0.002804
gluc 0.073920 0.383601 1.000000 -0.004203 0.013617 -0.009629
smoke 0.022997 0.012797 -0.004203 1.000000 0.341434 0.027203
alco 0.031839 0.037588 0.013617 0.341434 1.000000 0.026224
active 0.002184 0.002804 -0.009629 0.027203 0.026224 1.000000
cardio 0.326125 0.202257 0.088267 -0.020605 -0.011528 -0.037040
overweight 0.169567 0.126770 0.086850 -0.003981 0.024210 -0.002382
cardio overweight
id 0.003770 -0.000769
age 0.239987 0.089282
gender 0.001727 -0.055146
height -0.025673 -0.156139
weight 0.166886 0.655764
ap_hi 0.050321 0.016900
ap_lo 0.326125 0.169567
cholesterol 0.202257 0.126770
gluc 0.088267 0.086850
smoke -0.020605 -0.003981
alco -0.011528 0.024210
active -0.037040 -0.002382
cardio 1.000000 0.141138
overweight 0.141138 1.000000
This is a dataframe describing correlation, and I want to draw a heatmap. So, to remove redundancy, I want the upper triangular values including the diagonal values to be removed. What can I do?
Upvotes: 0
Views: 1160
Reputation: 11171
You can use the numpy function to keep the lower triangular part of your matrix. Assuming it's square, this should work:
import numpy as np
df[:] = np.tril(df.values, k=-1)
input example:
id age gender height weight ap_hi
id 1.000000 0.002623 0.003799 0.000221 0.000144 0.003489
age 0.002623 1.000000 -0.018274 -0.077426 0.069705 0.018482
gender 0.003799 -0.018274 1.000000 0.504722 0.130116 0.004941
height 0.000221 -0.077426 0.504722 1.000000 0.248868 0.004300
weight 0.000144 0.069705 0.130116 0.248868 1.000000 0.026527
ap_hi 0.003489 0.018482 0.004941 0.004300 0.026527 1.000000
output:
id age gender height weight ap_hi
id 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
age 0.002623 0.000000 0.000000 0.000000 0.000000 0.0
gender 0.003799 -0.018274 0.000000 0.000000 0.000000 0.0
height 0.000221 -0.077426 0.504722 0.000000 0.000000 0.0
weight 0.000144 0.069705 0.130116 0.248868 0.000000 0.0
ap_hi 0.003489 0.018482 0.004941 0.004300 0.026527 0.0
Upvotes: 1