Caesar Tex
Caesar Tex

Reputation: 295

How to convert a tokenized dataframe to string to generate a wordcloud

So I'm reading an excel file to a dataframe and then normalizing it ( lowercase, stopwords..etc)

Now my dataframe has multiple columns from the excel file but only the ones I needed and it looks something like below. I had to tokenize it.

df['col1']

0 [this, is , fun, interesting]
1 [this, is, fun, too]
2 [ even, more, fun]

I have more similar columns like df['col2'] and so on.

Now I want to generate a word cloud

from wordcloud import WordCloud
text = WordCloud().generate(df['col'])
plt.imshow(text)
plt.axis("off")
plt.show()

I'm trying to generate a wordcloud but this isn't working since apparently word cloud expects a string. How do I convert my entire dataframe to string?

I want to convert entire dataframe to string and then generate a wordcloud but if that's not possible then atleast a wordcloud per column would be nice.

Upvotes: 2

Views: 1264

Answers (2)

b-fg
b-fg

Reputation: 4137

You just need to convert your columns to string as so far you only have a list of strings which WordCloud cannot take. Simply,

text = WordCloud().generate(df['col1'].to_string())

And your output image is enter image description here

Upvotes: 2

BernardL
BernardL

Reputation: 5434

You should first consider if you are processing your data right, it seems to defeat the purpose of tokenizing it and then putting it all together again.

If you have to do it anyways, you can get the values from your columns and use chain from the Python standard module library to chain them together, then join them to get a string representation of all the words.

import pandas as pd
from itertools import chain

df = pd.DataFrame({'col1':[['this', 'is' , 'fun', 'interesting'],['this', 'is', 'fun', 'too'],['even','more']]})
word_list = list(chain.from_iterable(df.col1.values))
words = ' '.join(word_list)

words
>>'this is fun interesting this is fun too even more'

If this is done for multiple columns, you will have to append each of the column values together before you chain them.

Upvotes: 0

Related Questions