Reputation: 34328
Below is an extract of a dataframe which I have created my merging multiple query log dataframes:
keyword hits date average time
1 the cat sat on 10 10-Jan 10
2 who is the sea 5 10-Jan 1.2
3 under the earth 30 1-Dec 2.5
4 what is this 100 1-Feb 9
Is there a way I can pivot the data using Pandas so that rows are daily dates (e.g. 1-Jan, 2-Jan etc.) and the corresponding 1 column to each date is the daily sum of hits (sum of the hits for that day e.g. sum of hits for 1-Jan) divided by the monthly sum of hits (e.g. for the whole of Jan) for that month (i.e. the month normalised daily hit percentage for each day)
Upvotes: 0
Views: 291
Reputation: 35235
Parse the dates so we can extract the month later.
In [99]: df.date = df.date.apply(pd.Timestamp)
In [100]: df
Out[100]:
keyword hits date average time
1 the cat sat on 10 2013-01-10 00:00:00 10.0
2 who is the sea 5 2013-01-10 00:00:00 1.2
3 under the earth 30 2013-12-01 00:00:00 2.5
4 what is this 100 2013-02-01 00:00:00 9.0
Group by day and sum the hits.
In [101]: daily_totals = df.groupby('date').hits.sum()
In [102]: daily_totals
Out[102]:
date
2013-01-10 15
2013-02-01 100
2013-12-01 30
Name: hits, dtype: int64
Group by month, and divide each row (each daily total) by the sum of all the daily totals in that month.
In [103]: normalized_totals = daily_totals.groupby(lambda d: d.month).transform(lambda x: float(x)/x.sum())
In [104]: normalized_totals
Out[104]:
date
2013-01-10 1
2013-02-01 1
2013-12-01 1
Name: hits, dtype: int64
Your simple example only gave one day in each month, so all these are 1.
Upvotes: 1