Ben W
Ben W

Reputation: 133

My WordCloud is missing the letter 's' at the end of words

At first I thought the problem is with my data and that I made a mistake while cleaning the data. However I checked it and that is not the case.

I am using this code:

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

allWords = ' '.join([twts for twts in df['full_text']])
wordCloud = WordCloud(collocations=True, width = 1000,
height=600, random_state = 21, max_font_size = 120).generate(allWords)

plt.imshow(wordCloud, interpolation = "bilinear")
plt.axis('off')
plt.show()

Now my wordcloud shows words like "coronaviru", "viru", "crisi".With collocations=True it shows the full words in combination with other words like "coronavirus case" "coronavirus pandemic". Does anyone know how to fix this? Like I said, I checked the data and it is always the correct full word there. So I guess the mistake happens with the wordcloud.

My data looks like this:

    created_at                        id                full_text
0   Sat Aug 01 00:25:53 +0000 2020    28934685093219    life is hard with coronavirus
1   Sat Aug 01 00:25:53 +0000 2020    28934685093219    coronavirus sucks

Upvotes: 1

Views: 685

Answers (2)

JustLinh
JustLinh

Reputation: 36

You would need to change a parameter in the WordCloud function: normalize_plurals=False. Reference: https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html

normalize_plurals: bool, default=True. Whether to remove trailing ‘s’ from words. If True and a word appears with and without a trailing ‘s’, the one with trailing ‘s’ is removed and its counts are added to the version without trailing ‘s’ – unless the word ends with ‘ss’. Ignored if using generate_from_frequencies.

Upvotes: 2

gtomer
gtomer

Reputation: 6574

You are doing something wrong, your code works for me:

import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud

array = {'full_text': ['life is hard with coronavirus', 'coronavirus sucks']}
df = pd.DataFrame(array)

plt.style.use('fivethirtyeight')
allWords = ' '.join([twts for twts in df['full_text']])
wordCloud = WordCloud(collocations=True, width = 1000,
height=600, random_state = 21, max_font_size = 120).generate(allWords)

plt.imshow(wordCloud, interpolation = "bilinear")
plt.axis('off')
plt.show()

This is the output:

enter image description here

Upvotes: 1

Related Questions