Reputation: 515
I have a dataset of around 13000 rows and 2 columns (text and date) for two year period. One of the column is date in yyyy-mm-dd format. I want to perform time series analysis where x axis would be date (each day) and y axis would be frequency of text on corresponding date.
I think if I create a new data frame with unique dates and number of text on corresponding date that would solve my problem.
Sample data
How can I create a new column with frequency of text each day? For example:
Thanks in Advance!
Upvotes: 0
Views: 849
Reputation: 583
Depending on the task you are trying to solve, i can see two options for this dataset.
First things to do :
import pandas as pd
df = pd.DataFrame(data={'Date':['2018-01-01','2018-01-01','2018-01-01', '2018-01-02', '2018-01-03'], 'Text':['A','B','C','A','A']})
df['Date'] = pd.to_datetime(df['Date']) #convert to datetime type if not already done
Date Text
0 2018-01-01 A
1 2018-01-01 B
2 2018-01-01 C
3 2018-01-02 A
4 2018-01-03 A
Then for option one :
df = df.groupby('Date').count()
Text
Date
2018-01-01 3
2018-01-02 1
2018-01-03 1
For option two :
df[df['Text'].unique()] = pd.get_dummies(df['Text'])
df = df.drop('Text', axis=1)
df = df.groupby('Date').sum()
A B C
Date
2018-01-01 1 1 1
2018-01-02 1 0 0
2018-01-03 1 0 0
The get_dummies function will create one column per possible value of the Text field. Each column is then a boolean indicator for each row of the dataframe, telling us which value of the Text field occurred in this row. We can then simply make a sum aggregation with a groupby by the Date field.
If you are not familiar with the use of groupby and aggregation operation, i recommend that you read this guide first.
Upvotes: 1