Reputation: 11
I tried the code below :
!pip install python-bidi
from wordcloud import WordCloud
from matplotlib import pyplot as plt
from bidi.algorithm import get_display
text="""মুস্তাফিজ"""
bidi_text = get_display(text)
print(bidi_text)
# https://github.com/amueller/word_cloud/issues/367
# https://stackoverflow.com/questions/54063438/create-wordcloud-in-python-for-foreign-language-hebrew
# https://www.omicronlab.com/bangla-fonts.html
rgx = r"[\u0980-\u09FF]+"
wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf').generate(bidi_text)
#wordcloud = WordCloud(font_path='/content/FreeSansBold.ttf').generate(bidi_text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
then i get this error :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-87-56d899c0de07> in <module>()
12 # https://www.omicronlab.com/bangla-fonts.html
13 rgx = r"[\u0980-\u09FF]+"
---> 14 wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf').generate(bidi_text)
15
16 #wordcloud = WordCloud(font_path='/content/FreeSansBold.ttf').generate(bidi_text)
2 frames
/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate_from_frequencies(self, frequencies, max_font_size)
381 if len(frequencies) <= 0:
382 raise ValueError("We need at least 1 word to plot a word cloud, "
--> 383 "got %d." % len(frequencies))
384 frequencies = frequencies[:self.max_words]
385
ValueError: We need at least 1 word to plot a word cloud, got 0.
this line is not picking bangla words : wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf').generate(bidi_text)
i tried almost all the fonts from here for bangla language : https://www.omicronlab.com/bangla-fonts.html
nothing works
Upvotes: 1
Views: 1211
Reputation: 11
I followed this comment and could solve the problem in Ubuntu eventually.
Step 1:
!sudo apt-get install libfreetype6-dev libharfbuzz-dev libfribidi-dev gtk-doc-tools
Step 2:
!wget -O raqm-0.7.0.tar.gz https://raw.githubusercontent.com/python-pillow/pillow-depends/master/raqm-0.7.0.tar.gz
Now the raqm-0.7.0.tar.gz file should be in your downloads section.
Step 3:
!tar -xzvf raqm-0.7.0.tar.gz
Step 4:
!cd raqm-0.7.0
Step 5:
!./configure --prefix=/usr && make -j4 && sudo make -j4 install
Step 6:
Now you just have to reinstall the Pillow library. Activate the correct environment. Then run the following commands:
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade Pillow
That's it! Now you have a working Pillow library that can produce proper Bengali and other Indic fonts in the image.
Also, as suggested by @Farzana Eva in her comment, you need to pass the rgx variable in the wordcloud object.
Upvotes: 0
Reputation: 2220
I have generated a word cloud in Bangla using the following code. You can try it out:
def generate_Word_cloud(self,author_post, vocabularyWordnumber, img_file, stop_word_root_path):
stop_word_file = stop_word_root_path+'/stopwords-bn.txt'
print(stop_word_file)
f = open(stop_word_file, "r", encoding="utf8")
stop_word = f.read().split("\n")
print(stop_word)
final_text = " ".join(author_post)
print(final_text)
wordcloud = WordCloud(stopwords = stop_word, font_path='/usr/share/fonts/truetype/freefont/kalpurush.ttf',
width = 600, height = 500,max_font_size=300, max_words=vocabularyWordnumber,
min_word_length=4, background_color="black").generate(final_text)
wordcloud.to_file(img_file)
Upvotes: 0
Reputation: 21
You didn't change regexp with your defined one in the word cloud. While processing the text in the word cloud, it couldn't match the pattern and returned an empty list. Passing rgx variable while creating a word cloud object will solve your issue.
wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf',regexp=rgx).generate(bidi_text)
Here is the full snippet of the code.
!pip install python-bidi
from wordcloud import WordCloud
from matplotlib import pyplot as plt
from bidi.algorithm import get_display
text="""মুস্তাফিজ"""
bidi_text = get_display(text)
print(bidi_text)
# https://github.com/amueller/word_cloud/issues/367
# https://stackoverflow.com/questions/54063438/create-wordcloud-in-python-for-foreign-language-hebrew
# https://www.omicronlab.com/bangla-fonts.html
rgx = r"[\u0980-\u09FF]+"
wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf',
regexp=rgx).generate(bidi_text)
#wordcloud = WordCloud(font_path='/content/FreeSansBold.ttf').generate(bidi_text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Upvotes: 2