Romcom1398
Romcom1398

Reputation: 11

WIth wordcloud, it says 'only Truetypefonts supported' ('despite adding the right path), or 'TypeError: argument of type 'int' is not iterable

I wanted to see if I could generate a wordcloud on my fake dataframe, but I'm running in quite some trouble. I used the code from this website: https://www.analyticsvidhya.com/blog/2020/04/beginners-guide-exploratory-data-analysis-text-data/

This is the code I have thus far:

Making the dataframe

import pandas as pd
import numpy as np
import json
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation

s1 = 'The fox and the hound walk through the woods together and find a bunny'
s2 = 'The closet is made of glass'
s3 = 'Cheetahs are one of the fastest animals in the world'
s4 = 'Vincent van Gogh was a phenomenal artist'
s5 = 'Once upon a time there was an evil queen who cast a glorious curse'
s6 = 'Emma and Regina would have been a perfect power couple'
s7 = 'The iphones camera is way worse than the android camera'
s8 = 'who even wears only white socks? Thats boring'
s9 = 'Ebby is the most precious dog in the whole world'
s10 = 'Birds go chirp chirp chirp'
viralc = [1, 0 ,1, 1, 0, 1, 0,1,1, 0]

df = pd.DataFrame({'titles':[s1, s2, s3, s4, s5, s6, s7, s8, s9, s10], 'viral':viralc})
df

Using group_by and turning into tf-idf

df_grouped=df[['titles', 'viral']].groupby(by='viral').agg(lambda x:' '.join(x))
df_grouped.head()

*Attempting (and miserably failing) to generate a wordcloud:

# Importing wordcloud for plotting word clouds and textwrap for wrapping longer text
from wordcloud import WordCloud
from textwrap import wrap

# Function for generating word clouds
def generate_wordcloud(data,title):
  wc = WordCloud(width=400, height=330, max_words=150,colormap="Dark2", font_path='C:\\Users\\Romy\\Documents\\Studie\\DataScience_Master\\Thesis\\Fonts\\arial.ttf').generate_from_frequencies(data.to_dict())
  plt.figure(figsize=(10,8))
  plt.imshow(wc, interpolation='bilinear')
  plt.axis("off")
  plt.title('\n'.join(wrap(title,60)),fontsize=13)
  plt.show()
  
# Transposing document term matrix
df_dtm=df_dtm.transpose()

df_dtm# Plotting word cloud for each product
for index,product in enumerate(df_dtm.columns):
    generate_wordcloud(df_dtm[product].sort_values(ascending=False), product)

This gives me either the error that wordcloud only supports Truetypefonts (which the font I put in, is), or this error:

   ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [106], in <cell line: 18>()
     17 df_dtm# Plotting word cloud for each product
     18 for index,product in enumerate(df_dtm.columns):
---> 19     generate_wordcloud(df_dtm[product].sort_values(ascending=False), product)

Input In [106], in generate_wordcloud(data, title)
      6 def generate_wordcloud(data,title):
----> 7   wc = WordCloud(width=400, height=330, max_words=150,colormap="Dark2", font_path='C:\\Users\\Romy\\Documents\\Studie\\DataScience_Master\\Thesis\\Fonts\\arial.ttf').generate_from_frequencies(data.to_dict())
      8   plt.figure(figsize=(10,8))
      9   plt.imshow(wc, interpolation='bilinear')

File ~\anaconda3\lib\site-packages\wordcloud\wordcloud.py:453, in WordCloud.generate_from_frequencies(self, frequencies, max_font_size)
    451     font_size = self.height
    452 else:
--> 453     self.generate_from_frequencies(dict(frequencies[:2]),
    454                                    max_font_size=self.height)
    455     # find font sizes
    456     sizes = [x[1] for x in self.layout_]

File ~\anaconda3\lib\site-packages\wordcloud\wordcloud.py:508, in WordCloud.generate_from_frequencies(self, frequencies, max_font_size)
    505 transposed_font = ImageFont.TransposedFont(
    506     font, orientation=orientation)
    507 # get size of resulting text
--> 508 box_size = draw.textbbox((0, 0), word, font=transposed_font, anchor="lt")
    509 # find possible places using integral image:
    510 result = occupancy.sample_position(box_size[3] + self.margin,
    511                                    box_size[2] + self.margin,
    512                                    random_state)

File ~\anaconda3\lib\site-packages\PIL\ImageDraw.py:653, in ImageDraw.textbbox(self, xy, text, font, anchor, spacing, align, direction, features, language, stroke_width, embedded_color)
    650 if embedded_color and self.mode not in ("RGB", "RGBA"):
    651     raise ValueError("Embedded color supported only in RGB and RGBA modes")
--> 653 if self._multiline_check(text):
    654     return self.multiline_textbbox(
    655         xy,
    656         text,
   (...)
    665         embedded_color,
    666     )
    668 if font is None:

File ~\anaconda3\lib\site-packages\PIL\ImageDraw.py:368, in ImageDraw._multiline_check(self, text)
    365 """Draw text."""
    366 split_character = "\n" if isinstance(text, str) else b"\n"
--> 368 return split_character in text

TypeError: argument of type 'int' is not iterable

There are a few things I must mention:

  1. It says there's a newer version of pip available, but when I run the code it suggests, it doesn't install the newer version. And I don't know if this newer version is needed to use wordcloud.

  2. It also says there is a weird (weird is not the right word but basically it's not supposed to be like that) package installed called 'illow', which I am guessing is supposed to be pillow. And I know you do need pillow but it seems as if pillow is actually installed, namely version 9.5.0

  3. I then figured that maybe I could try running my code on our schools GPU, since I will need to anyway for the final code, and maybe installing things there was easier (spoiler alert: it was not): I ran this code to install wordcloud in my environment (as told by the anaconda website, which I have used for installing other packages as well):

    conda install -c conda-forge wordcloud

but I got this error:

> Retrieving notices: ...working... done Collecting package metadata
> (current_repodata.json): done Solving environment: failed with initial
> frozen solve. Retrying with flexible solve. Solving environment:
> failed with repodata from current_repodata.json, will retry with next
> repodata source. Collecting package metadata (repodata.json): done
> Solving environment: failed with initial frozen solve. Retrying with
> flexible solve. Solving environment: - Found conflicts! Looking for
> incompatible packages. This can take several minutes.  Press CTRL-C to
> abort. failed
> 
> UnsatisfiableError: The following specifications were found to be
> incompatible with the existing python installation in your
> environment:
> 
> Specifications:
> 
>   - wordcloud -> python[version='2.7.*|3.5.*|3.6.*|>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0|>=3.7,<3.8.0a0|>=3.6,<3.7.0a0|3.4.*']
> 
> Your python: python=3.11
> 
> If python is on the left-most side of the chain, that's the version
> you've asked for. When python appears to the right, that indicates
> that the thing on the left is somehow not available for the python
> version you are constrained to. Note that conda will not change your
> python version to a different minor version unless you explicitly
> specify that.
> 
> The following specifications were found to be incompatible with your
> system:
> 
>   - feature:/linux-64::__glibc==2.31=0
>   - python=3.11 -> libgcc-ng[version='>=11.2.0'] -> __glibc[version='>=2.17']
>   - wordcloud -> libgcc-ng[version='>=9.4.0'] -> __glibc[version='>=2.17']
> 
> Your installed version is: 2.31

And... Well I mean I do a data science master but they have never taught us anything about computer science or how installing everything works, and I am lost on where to even begin solving this and if it is even needed to use wordcloud.

Does anyone have any idea for what I can do?

Upvotes: 0

Views: 1161

Answers (1)

shch
shch

Reputation: 16

For the viral labels that you have defined, use strings instead of integer values. Replace your declaration of the viralc list with this

viralc = ["1", "0" ,"1", "1", "0", "1", "0", "1", "1", "0"]

I am getting these word clouds for viral = 0 and 1.

enter image description here

Upvotes: 0

Related Questions