Plot occurrences over time of specific words in a large dataset of texts (tweets) in Python

Question

I need to plot the occurrence of a word over time for a pandas dataframe (time series) with a column of text.

The dataframe looks like this:

index,                date,       ... , text
2020-10-20 20:20:00 , 2020-10-20 ,... , "The text goes here"
.
.
.

What I want to have is a graph that shows the ocuurance of a specific word (for example "here") over time.

Here is what I currently have (It does the work but is so inefficient for large data and multiple words):

df['contains_word']=df['text'].str.contains('word')
df['contains_word']=df['contains_word'].replace(True, 1)
df['contains_word']=df['contains_word'].replace(False, 0)

g=df.groupby('date').contains_word.count()
plt.plot(g.index, g , c='r')
plt.xticks(rotation=90)
plt.title('xxx')
plt.show()

And here is the example output:

Plot occurrences over time of specific words in a large dataset of texts (tweets) in Python

Answers (1)

Related Questions