malloc
malloc

Reputation: 684

How to draw time series of words occurrence in Matplotlib with python?

I have a text file with this content

   'word' , 'timestamp'
    word1 , 1546403642
    word2 , 1546392481
    word1 , 1546403642
    word3 , 1546394402
    ...

which first field is the word(10 word max with multiple occurrence) and second is timestamp of occurrence of that word.

I have no problem reading this file and parsing this CSV file with Pandas and converting linux timestamp to another format but i don't know how to put it in Matplotlib to show each word occurrence during time, something like this: enter image description here

I am looking for a hint or library or close example how to plot this, I couldn't found any close example in time series, like this.

I found some examples in this and this links but i can't apply them to my data because they have the number of occurrence in each row but I don't.

Any help would be appreciated.

Upvotes: 0

Views: 798

Answers (1)

Dani G
Dani G

Reputation: 1252

You need to decide by what timeframe you want to aggregate the word count, for example, lets say that you want a monthly count, you can do this:

import pandas as pd
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp',inplace=True)
df = pd.get_dummies(df)
df = df.resample('1M').sum()

get_dummies will create a column for each word and will give a value of 0 or 1. After that, you resample by the timeframe that you choose and aggregate it by summing, so the result is the number of occurrences.

Now you can plot it via the tutorials in the links that you provided.

Upvotes: 2

Related Questions