Reputation: 129
For the sake of learning, I'm working on this word cloud program, that counts the number of times a word appears in a text, and prints it out in a kind of "word-cloud" image.
The program works fine, but I would like to address a couple of esthetical things like:
... and...
I would like it to look something like this (or as least as close to it as possible)
The code in question is
filename = "adventure.txt"
infile = open(filename)
wordcounts = {}
for line in infile:
words = line.split()
for word in words:
w = "".join([e for e in word if e.isalpha()])
w = w.lower()
if w in wordcounts:
wordcounts[w] = wordcounts[w] + 1
else:
wordcounts[w] = 1
#Put all words in list and sort counts
words = list(wordcounts.keys())
words.sort(key=lambda x:wordcounts[x], reverse=True)
import matplotlib.pyplot as plt
import numpy as np
#Set maximum fornt size to 50
scale = 50/wordcounts[words[1]]
#Set up empty plot with limits on x-axis and y-axis
plt.axes(xlim=(0,100), ylim=(0,100) )
#Plot 50 most frequent words with size=frequency
N = min(len(words), 50)
for i in range(0,N):
x = np.random.uniform(0,90)
y = np.random.uniform(0,90)
freq = wordcounts[words[i]]
col =["r", "g", "b", "m", "c", "k"][i % 5]
plt.text(x, y, words[i], fontsize=scale * freq, color=col)
plt.show()
All help is welcomed and appreciated.
Upvotes: 0
Views: 2162
Reputation: 1490
Define the figure object without axes' ticks and labels:
fig = plt.figure(figsize=(10, 10), num=1, clear=True)
ax = plt.subplot(1, 1, 1, xticks=[], yticks=[], frameon=False)
Remove this line:
plt.axes(xlim=(0,100), ylim=(0,100) )
Concluding lines:
for i in range(0,N):
x = np.random.uniform(0,90)
y = np.random.uniform(0,90)
freq = wordcounts[words[i]]
col =["r", "g", "b", "m", "c", "k"][i % 5]
ax.text(x, y, words[i], fontsize=scale * freq, color=col)
plt.show()
To make your plot look similar to the example you provided... it'll take a lot of manual plug&chug, trial&error, whatever you want to call it; you'll have to plug in coordinates for each word and decide where you think each word looks best in terms of x-y coordinates--one tip being that the largest word should be plotted last (i.e. when i == N - 1), while the smallest text should be plotted first (i.e. when i == 0); in that way the larger text won't have smaller text overlaying it. You could also focus on having non-overlapping coordinates with enough distance so that the words aren't too close to one another--alternatively, if you want the words to be touching, you could scale the degree to which they're overlapping one another. Have a colormap that randomizes the RGB list so that the colors are more "scattered" (rather than having all of the yellow text in the top right corner for example). Maybe center the larger text and have the smaller text more towards the periphery. The list goes on, but I think you get the idea.
Upvotes: 1