Reputation: 435
I am working on project to find similarity between two sentences/documents using tf-idf measure.
Now my question is how can I show the similarity in a graphical/Visualization format. Something like a Venn diagram where intersection value becomes the similarity measure or any other plots available in matplotlib or any python libraries.
I tried the following code:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
documents = (
"The sky is blue",
"The sun is bright"
)
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
print tfidf_matrix
cosine = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix)
print cosine
import matplotlib.pyplot as plt
r=25
d1 = 2 * r * (1 - cosine[0][0])
circle1=plt.Circle((0,0),d1/2,color='r')
d2 = 2 * r * (1 - cosine[0][1])
circle2=plt.Circle((r,0),d2/2,color="b")
fig = plt.gcf()
fig.gca().add_artist(circle1)
fig.gca().add_artist(circle2)
fig.savefig('plotcircles.png')
plt.show()
But the plot I got was empty. Can some one explain what might be the error.
plotting circle source:plot a circle
Upvotes: 2
Views: 530
Reputation: 284850
Just to explain what's going on, here's a stand-alone example of your problem (if the circle is entirely outside the boundaries, nothing would be shown):
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
fig, ax = plt.subplots()
circ = Circle((1, 1), 0.5)
ax.add_artist(circ)
plt.show()
When you manually add an artist through add_artist
, add_patch
, etc, autoscaling isn't applied unless you explicitly do so. You're accessing a lower-level interface of matplotlib that's what the higher-level functions (e.g. plot
) are built on top of. However, this is also the easiest way to add a single circle in data coordinates, so the lower-level interface is what you want in this case.
Furthermore, add_artist
is too general for this. You actually want add_patch
(plt.Circle
is matplotlib.patches.Circle
). The difference between add_artist
and add_patch
may seem arbitrary, but add_patch
has extra logic to calculate the extent of a patch for autoscaling, whereas add_artist
is the "bare" lower-level function that can take any artist, but doesn't do anything special. Autoscaling won't work correctly for a patch if you add it with add_artist
.
To autoscale the plot based on the artists that you've added, call ax.autoscale()
:
As a quick example of autoscaling a manually added patch:
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
fig, ax = plt.subplots()
circ = Circle((1, 1), 0.5)
ax.add_patch(circ)
ax.autoscale()
plt.show()
Your next question might be "why isn't the circle round?". It is, in data coordinates. However, the x and y scales of the plot (this is the aspect ratio, in matplotlib terminology) are currently different. To force them to be the same, call ax.axis('equal')
or ax.axis('scaled')
. (We can actually leave out the call to autoscale
in this case, as ax.axis('scaled'/'equal')
will effectively call it for us.):
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
fig, ax = plt.subplots()
circ = Circle((1, 1), 0.5)
ax.add_patch(circ)
ax.axis('scaled')
plt.show()
Upvotes: 4
Reputation: 1588
The Plots are not empty, but I guess, your circles are to big!
I don't have sklearn installed, so I start at the point where you print cosine
:
## set constants
r = 1
d = 2 * r * (1 - cosine[0][1])
## draw circles
circle1=plt.Circle((0, 0), r, alpha=.5)
circle2=plt.Circle((d, 0), r, alpha=.5)
## set axis limits
plt.ylim([-1.1, 1.1])
plt.xlim([-1.1, 1.1 + d])
fig = plt.gcf()
fig.gca().add_artist(circle1)
fig.gca().add_artist(circle2)
## hide axes if you like
# fig.gca().get_xaxis().set_visible(False)
# fig.gca().get_yaxis().set_visible(False)
fig.savefig('venn_diagramm.png')
That also answers your other question, where I also added this piece of code!
Upvotes: 1