Reputation: 295
So I'm reading an excel file to a dataframe and then normalizing it ( lowercase, stopwords..etc)
Now my dataframe has multiple columns from the excel file but only the ones I needed and it looks something like below. I had to tokenize it.
df['col1']
0 [this, is , fun, interesting]
1 [this, is, fun, too]
2 [ even, more, fun]
I have more similar columns like df['col2'] and so on.
Now I want to generate a word cloud
from wordcloud import WordCloud
text = WordCloud().generate(df['col'])
plt.imshow(text)
plt.axis("off")
plt.show()
I'm trying to generate a wordcloud but this isn't working since apparently word cloud expects a string. How do I convert my entire dataframe to string?
I want to convert entire dataframe to string and then generate a wordcloud but if that's not possible then atleast a wordcloud per column would be nice.
Upvotes: 2
Views: 1264
Reputation: 4137
You just need to convert your columns to string
as so far you only have a list
of strings which WordCloud
cannot take. Simply,
text = WordCloud().generate(df['col1'].to_string())
Upvotes: 2
Reputation: 5434
You should first consider if you are processing your data right, it seems to defeat the purpose of tokenizing it and then putting it all together again.
If you have to do it anyways, you can get the values from your columns and use chain
from the Python standard module library to chain them together, then join them to get a string representation of all the words.
import pandas as pd
from itertools import chain
df = pd.DataFrame({'col1':[['this', 'is' , 'fun', 'interesting'],['this', 'is', 'fun', 'too'],['even','more']]})
word_list = list(chain.from_iterable(df.col1.values))
words = ' '.join(word_list)
words
>>'this is fun interesting this is fun too even more'
If this is done for multiple columns, you will have to append each of the column values together before you chain them.
Upvotes: 0