Droid-Bird
Droid-Bird

Reputation: 1505

How to draw a heatmap of similarity from two one dimensional arrays in python?

I have two arrays as following,

a = np.array([5., 10., 20., 19., 1., 10., 60.])
b = np.array([7., 10., 10., 17., 20., 0., 50.])

I want to be able to plot a heatmap showing how close array a, to array b. If possible, show the a value and b value on mouse hover. For example, 1 in array a, is not close to 20 in array b -- should be lighter color, etc. Any idea where to start? Thank you.

Upvotes: 1

Views: 2958

Answers (2)

Arne
Arne

Reputation: 10545

Scikit-learn has a handy function to compute the pairwise distances. You just need to reshape the arrays, because it expects 2d arrays as input. Then I would also use seaborn, as Eduardo suggested.

import numpy as np
from sklearn.metrics import pairwise_distances
import seaborn as sns

a = np.array([5., 10., 20., 19., 1., 10., 60.])
b = np.array([7., 10., 10., 17., 20., 0., 50.])

distances = pairwise_distances(X=a.reshape(-1, 1), Y=b.reshape(-1, 1))

sns.heatmap(distances, square=True, annot=True, cbar=False, cmap='Blues');

heatmap

Edit: To reverse the colors, you can use the colormap 'Blues_r' instead. I don't know if there is a way to flip the y-axis at the seaborn level, but you can always flip the input data and change the labels accordingly:

distances = pairwise_distances(X=np.flip(a).reshape(-1, 1), Y=b.reshape(-1, 1))
sns.heatmap(distances, square=True, annot=True, cbar=False, cmap='Blues_r', 
            yticklabels=list(reversed(range(len(a)))));

heatmap, flipped version

Upvotes: 3

mozway
mozway

Reputation: 260490

What does your data represent? There are many ways to compare things and determine whether they are different. You could compute the difference, the ratio, etc. There is not right way to address your question without a bit more context.

If your two values are supposed to be proportional, I would plot them as a scatter plot with each one as an axis

import pandas as pd
a = np.array([5., 10., 20., 19., 1., 10., 60.])
b = np.array([7., 10., 10., 17., 20., 0., 50.])
df = pd.DataFrame({'a': a, 'b': b})
df.plot.scatter(x='a', y='b')

a vs b scatterplot

You could also use seaborn's regplot:

import seaborn as sns
ax = sns.regplot(data=df, x='a', y='b', robust=True)

a vs b regplot

If you really want to use a heatmap, I would go for a clustermap as this will cluster apart the values that are similar and those that are different:

sns.clustermap(df)

clustermap

Use the annot=True parameter to display the values:

clustermap with annot

Upvotes: 1

Related Questions