Reputation: 11
I had some problem with WordCloud code in python when try to run Arabic huge data this my code:
from os import path
import codecs
from wordcloud import WordCloud
import arabic_reshaper
from bidi.algorithm import get_display
d = path.dirname(__file__)
f = codecs.open(path.join(d, 'C:/example.txt'), 'r', 'utf-8')
text = arabic_reshaper.reshape(f.read())
text = get_display(text)
wordcloud = WordCloud(font_path='arial',background_color='white', mode='RGB',width=1500,height=800).generate(text)
wordcloud.to_file("arabic_example.png")
And this is the error I get:
Traceback (most recent call last):
File "", line 1, in runfile('C:/Users/aam20/Desktop/python/codes/WordClouds/wordcloud_True.py', wdir='C:/Users/aam20/Desktop/python/codes/WordClouds')
File "C:\Users\aam20\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)
File "C:\Users\aam20\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/aam20/Desktop/python/codes/WordClouds/wordcloud_True.py", line 28, in text = get_display(text)
File "C:\Users\aam20\Anaconda3\lib\site-packages\bidi\algorithm.py", line 648, in get_display resolve_implicit_levels(storage, debug)
File "C:\Users\aam20\Anaconda3\lib\site-packages\bidi\algorithm.py", line 466, in resolve_implicit_levels
'%s not allowed here' % _ch['type']
AssertionError: RLI not allowed here
Can someone help resolve this issue?
Upvotes: 1
Views: 3354
Reputation: 11
I had similar issue, so I removed the emoji than it work just fine
from wordcloud import WordCloud
import arabic_reshaper
import emoji
your_txt = emoji.replace_emoji(your_txt, replace='', version=-1)
your_txt = arabic_reshaper.reshape(your_txt)
your_txt = get_display(reshaped_text)
wordcloud = WordCloud(font_path='NotoNaskhArabic-Regular.ttf').generate(your_txt)
wordcloud.to_file("ar_wordCloud.png")
Upvotes: 1
Reputation: 87
I tried to preprocess the text with the mentioned method below! before calling reshaper and it worked for me.
def removeWeirdChars(text):
weirdPatterns = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
u"\U0001f926-\U0001f937"
u'\U00010000-\U0010ffff'
u"\u200d"
u"\u2640-\u2642"
u"\u2600-\u2B55"
u"\u23cf"
u"\u23e9"
u"\u231a"
u"\u3030"
u"\ufe0f"
u"\u2069"
u"\u2066"
u"\u200c"
u"\u2068"
u"\u2067"
"]+", flags=re.UNICODE)
return weirdPatterns.sub(r'', text)
Upvotes: 5
Reputation: 148
Here is how you can simply generate Arabic wordCloud:
import arabic_reshaper
from bidi.algorithm import get_display
reshaped_text = arabic_reshaper.reshape(text)
bidi_text = get_display(reshaped_text)
wordcloud = WordCloud(font_path='NotoNaskhArabic-Regular.ttf').generate(bidi_text)
wordcloud.to_file("worCloud.png")
And here is a link to Google colab example: Colab notebook
Upvotes: 0
Reputation: 1
There is a weird character in your text that get_display()
is unable to deal with. You can find this character and add it to a list of stopwords. However it might be very painful. One shortcut is to create a dictionary with most frequent words and their frequencies and feed it to generate_from_frequencies
fucnction:
wordcloud = WordCloud(font_path='arial',background_color='white', mode='RGB',width=1500,height=800).generate_from_frequencies(YOURDICT)
For more information check my response to this post.
Upvotes: 0