Reputation: 21
I am working on a Hindi dataset for a project and did the pre-processing of the data where I am creating a word cloud for the same. I have used "gargi" font to plot the Hindi words on word cloud where I am facing an issue of accent("ि मात्रा"). In the word cloud, this accent is coming next to the letter on which it is supposed to be, for example, पुलिस is coming as पुलसि. (Kindly refer to this attached where the word किसान has the accent(मात्रा) is on the same letter(वर्ण)).
There are several other words in this word cloud that reflect a similar issue. I have tried using different fonts as well like "lohit-devnagri", "samyak-devnagri".
font = "gargi.ttf"
figure,axis = plt.subplots(2,2,figsize=(16,10))
figure.tight_layout(pad=5.0)
wordcloud_kisaan = WordCloud(width = 1000, height = 700,
background_color ='white',
min_font_size = 10, font_path= font).generate_from_frequencies(counter_kisaan)
axis[0][0].imshow(wordcloud_kisaan,interpolation="bilinear")
axis[0][0].axis('off')
axis[0][0].set_title('Kisaan Andolan', fontsize=22)
plt.axis("off")
plt.tight_layout(pad = 5.0)
plt.show()
Upvotes: 2
Views: 312