Reputation:

How to plot the frequency of a specific word through time

I have a dataset

Column1      Column2                                Column3   ....
2020/05/02   She heard the gurgling water          (not relevant)
2020/05/02   The water felt delightful
2020/05/03   Another instant and I shall never again see the sun, this water, that gorge!
2020/05/04   Fire would have been her choice.
2020/05/04   Everywhere you go in world are water fountains.
...
2020/05/31   She spelled "mother" several times.

I would like to plot the frequency of word 'water' through time. How could I do?

What I have tried is defining a pattern:

pattern=['water']

and apply re.search:

df['Column2'] = df['Column2'].apply(lambda x: re.search(pattern,x).group(1))

to select the word water in Column2. To group by date and count them, I would use

df.groupby(['Column1','Column2'])['Column1'].agg({'Frequency':'count'})

and to plot them I would use matplotlib (using a bar plot):

df['Column1'].value_counts().plot.bar()

This is what I have tried, with a lot of mistakes.

Upvotes: 1

Answers (3)

wwnde

Reputation: 26676

chain df.assign and str.count to extract word count. groupby column1 and plot either .plot,bar() or .plot(kind='bar')

     import matplotlib.pyplot as plt


(df.assign(count=df.column2.str.count('water'))).groupby('column1')['count'].sum().plot.bar()
#(df.assign(count=df.column2.str.count('water'))).groupby('column1')['count'].sum().plot(kind='bar')
        plt.ylabel('Count')
        plt.xlabel('Date')

Plot

Upvotes: 1

Derek O

Reputation: 19610

You can use the built in string.count(substring) method for strings in Python. Then count and sum the frequency column by each day of the dates.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

df = pd.DataFrame({'Column1':['2020/05/02','2020/05/02','2020/05/03','2020/05/04','2020/05/04'],
    'Column2':["She heard the gurgling water", "The water felt delightful",
    "Another instant and I shall never again see the sun, this water, that gorge!",
    "Fire would have been her choice.",
    "Everywhere you go in world are water fountains"]})
# lazy way to convert strings to dates
df['Column1'] = pd.to_datetime(df['Column1'], infer_datetime_format=True)

pattern = "water"

df['Frequency'] = df['Column2'].apply(lambda x: x.count(pattern))

# sum the frequency of the word 'water' over each separate day
ax = df['Frequency'].groupby(df['Column1'].dt.to_period('D')).sum().plot(kind='bar')

# force integer yaxis labels
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
ax.tick_params(axis='x', which='major', labelsize=6)

# Rotate tick marks on x-axis
plt.setp(ax.get_xticklabels(), rotation = 90)

plt.show()

Upvotes: 1

Ian

Reputation: 3908

Setup

df = pd.DataFrame({
    "Column1": ["2020/05/02", "2020/05/02", "2020/05/03", "2020/05/04", "2020/05/04", "2020/05/31"],
    "Column2": ["She heard the gurgling water water", "The water felt delightful", "Another instant and I shall never again see the sun, this water, that gorge!", "Fire would have been her choice.", "Everywhere you go in world are water fountains.", "She spelled 'mother' several times."]
})

Logic

# for each string, get the number of times a phrase appears
df['phrase_count'] = df['Column2'].str.count('water')

# plot the results
df.groupby('Column1')['phrase_count'].sum().plot(kind='bar')

Results

Upvotes: 2

How to plot the frequency of a specific word through time

Answers (3)

Related Questions