Reputation: 133
At first I thought the problem is with my data and that I made a mistake while cleaning the data. However I checked it and that is not the case.
I am using this code:
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
allWords = ' '.join([twts for twts in df['full_text']])
wordCloud = WordCloud(collocations=True, width = 1000,
height=600, random_state = 21, max_font_size = 120).generate(allWords)
plt.imshow(wordCloud, interpolation = "bilinear")
plt.axis('off')
plt.show()
Now my wordcloud shows words like "coronaviru", "viru", "crisi".With collocations=True
it shows the full words in combination with other words like "coronavirus case" "coronavirus pandemic".
Does anyone know how to fix this?
Like I said, I checked the data and it is always the correct full word there. So I guess the mistake happens with the wordcloud.
My data looks like this:
created_at id full_text
0 Sat Aug 01 00:25:53 +0000 2020 28934685093219 life is hard with coronavirus
1 Sat Aug 01 00:25:53 +0000 2020 28934685093219 coronavirus sucks
Upvotes: 1
Views: 685
Reputation: 36
You would need to change a parameter in the WordCloud function: normalize_plurals=False. Reference: https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html
normalize_plurals: bool, default=True. Whether to remove trailing ‘s’ from words. If True and a word appears with and without a trailing ‘s’, the one with trailing ‘s’ is removed and its counts are added to the version without trailing ‘s’ – unless the word ends with ‘ss’. Ignored if using generate_from_frequencies.
Upvotes: 2
Reputation: 6574
You are doing something wrong, your code works for me:
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud
array = {'full_text': ['life is hard with coronavirus', 'coronavirus sucks']}
df = pd.DataFrame(array)
plt.style.use('fivethirtyeight')
allWords = ' '.join([twts for twts in df['full_text']])
wordCloud = WordCloud(collocations=True, width = 1000,
height=600, random_state = 21, max_font_size = 120).generate(allWords)
plt.imshow(wordCloud, interpolation = "bilinear")
plt.axis('off')
plt.show()
This is the output:
Upvotes: 1