Reputation: 1505
I have two arrays as following,
a = np.array([5., 10., 20., 19., 1., 10., 60.])
b = np.array([7., 10., 10., 17., 20., 0., 50.])
I want to be able to plot a heatmap showing how close array a, to array b. If possible, show the a value and b value on mouse hover. For example, 1 in array a, is not close to 20 in array b -- should be lighter color, etc. Any idea where to start? Thank you.
Upvotes: 1
Views: 2958
Reputation: 10545
Scikit-learn has a handy function to compute the pairwise distances. You just need to reshape the arrays, because it expects 2d arrays as input. Then I would also use seaborn, as Eduardo suggested.
import numpy as np
from sklearn.metrics import pairwise_distances
import seaborn as sns
a = np.array([5., 10., 20., 19., 1., 10., 60.])
b = np.array([7., 10., 10., 17., 20., 0., 50.])
distances = pairwise_distances(X=a.reshape(-1, 1), Y=b.reshape(-1, 1))
sns.heatmap(distances, square=True, annot=True, cbar=False, cmap='Blues');
Edit: To reverse the colors, you can use the colormap 'Blues_r'
instead. I don't know if there is a way to flip the y-axis at the seaborn level, but you can always flip the input data and change the labels accordingly:
distances = pairwise_distances(X=np.flip(a).reshape(-1, 1), Y=b.reshape(-1, 1))
sns.heatmap(distances, square=True, annot=True, cbar=False, cmap='Blues_r',
yticklabels=list(reversed(range(len(a)))));
Upvotes: 3
Reputation: 260490
What does your data represent? There are many ways to compare things and determine whether they are different. You could compute the difference, the ratio, etc. There is not right way to address your question without a bit more context.
If your two values are supposed to be proportional, I would plot them as a scatter plot with each one as an axis
import pandas as pd
a = np.array([5., 10., 20., 19., 1., 10., 60.])
b = np.array([7., 10., 10., 17., 20., 0., 50.])
df = pd.DataFrame({'a': a, 'b': b})
df.plot.scatter(x='a', y='b')
You could also use seaborn's regplot:
import seaborn as sns
ax = sns.regplot(data=df, x='a', y='b', robust=True)
If you really want to use a heatmap, I would go for a clustermap as this will cluster apart the values that are similar and those that are different:
sns.clustermap(df)
Use the annot=True
parameter to display the values:
Upvotes: 1