user7469692
user7469692

Reputation:

Creating wordcloud using python

I am trying to create a wordcloud in python after cleaning text file ,

I got the required results i.e words which are mostly used in the text file but unable to plot.

My code:

import collections
from wordcloud import WordCloud
import matplotlib.pyplot as plt

file = open('example.txt', encoding = 'utf8' )
stopwords = set(line.strip() for line in open('stopwords'))
wordcount = {}

for word in file.read().split():
    word = word.lower()
    word = word.replace(".","")
    word = word.replace(",","")
    word = word.replace("\"","")
    word = word.replace("“","")
    if word not in stopwords:
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1

d = collections.Counter(wordcount)
for word, count in d.most_common(10):
    print(word , ":", count)

#wordcloud = WordCloud().generate(text)
#fig = plt.figure()
#fig.set_figwidth(14)
#fig.set_figheight(18)

#plt.imshow(wordcloud.recolor(color_func=grey_color, random_state=3))
#plt.title(title, color=fontcolor, size=30, y=1.01)
#plt.annotate(footer, xy=(0, -.025), xycoords='axes fraction', fontsize=infosize, color=fontcolor)
#plt.axis('off')
#plt.show()

Edit: Plotted the wordcloud with following code:

wordcloud = WordCloud(background_color='white',
                          width=1200,
                          height=1000
                         ).generate((d.most_common(10)))


plt.imshow(wordcloud)
plt.axis('off')
plt.show()

But getting TypeError: expected string or buffer

when I tried the above code with .generate(str(d.most_common(10)))

The wordcloud formed is showing apostrophe(') sign after several words

using Jupyter Notebook | python3 | Ipython

Upvotes: 0

Views: 11156

Answers (2)

Greg E
Greg E

Reputation: 1

most_common(x) is not a method of WordCloud. However, you can pass the parameter

max_words = 

and this should do what you're attempting.

Upvotes: -1

glegoux
glegoux

Reputation: 3603

First download this file Symbola.ttf in the current folder of the following script.

Architecture file:

file.txt Symbola.ttf my_word_cloud.py

file.txt:

foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz
foo foo foo foo foo foo foo foo foo foo bizz bizz bizz bizz foo foo

my_word_cloud.py:

import io
from collections import Counter
from os import path

import matplotlib.pyplot as plt
from wordcloud import WordCloud

d = path.dirname(__file__)

# It is important to use io.open to correctly load the file as UTF-8
text = io.open(path.join(d, 'file.txt')).read()

words = text.split()
print(Counter(words))

# Generate a word cloud image
# The Symbola font includes most emoji
font_path = path.join(d, 'Symbola.ttf')
word_cloud = WordCloud(font_path=font_path).generate(text)

# Display the generated image:
plt.imshow(word_cloud)
plt.axis("off")
plt.show()

Result:

Counter({'foo': 17, 'bizz': 9, 'buzz': 5})

word cloud

See a lot of other examples, here I created a simple example for you:

https://github.com/amueller/word_cloud/tree/master/examples

Upvotes: 2

Related Questions