user7289
user7289

Reputation: 34328

Reshaping tables in pandas

Below is an extract of a dataframe which I have created my merging multiple query log dataframes:

                keyword               hits         date         average time
1               the cat sat on        10           10-Jan       10
2               who is the sea        5            10-Jan       1.2
3               under the earth       30           1-Dec        2.5
4               what is this          100          1-Feb        9

Is there a way I can pivot the data using Pandas so that rows are daily dates (e.g. 1-Jan, 2-Jan etc.) and the corresponding 1 column to each date is the daily sum of hits (sum of the hits for that day e.g. sum of hits for 1-Jan) divided by the monthly sum of hits (e.g. for the whole of Jan) for that month (i.e. the month normalised daily hit percentage for each day)

Upvotes: 0

Views: 291

Answers (1)

Dan Allan
Dan Allan

Reputation: 35235

Parse the dates so we can extract the month later.

In [99]: df.date = df.date.apply(pd.Timestamp)

In [100]: df
Out[100]: 
           keyword  hits                date  average time
1   the cat sat on    10 2013-01-10 00:00:00          10.0
2   who is the sea     5 2013-01-10 00:00:00           1.2
3  under the earth    30 2013-12-01 00:00:00           2.5
4     what is this   100 2013-02-01 00:00:00           9.0

Group by day and sum the hits.

In [101]: daily_totals = df.groupby('date').hits.sum()

In [102]: daily_totals
Out[102]: 
date
2013-01-10     15
2013-02-01    100
2013-12-01     30
Name: hits, dtype: int64

Group by month, and divide each row (each daily total) by the sum of all the daily totals in that month.

In [103]: normalized_totals = daily_totals.groupby(lambda d: d.month).transform(lambda x: float(x)/x.sum())

In [104]: normalized_totals
Out[104]: 
date
2013-01-10    1
2013-02-01    1
2013-12-01    1
Name: hits, dtype: int64

Your simple example only gave one day in each month, so all these are 1.

Upvotes: 1

Related Questions