dlvr
dlvr

Reputation: 93

Plot the result of a groupby operation in pandas

I have this sample table:

    ID   Date      Days Volume/Day
0   111 2016-01-01  20  50
1   111 2016-02-01  25  40
2   111 2016-03-01  31  35
3   111 2016-04-01  30  30
4   111 2016-05-01  31  25
5   111 2016-06-01  30  20
6   111 2016-07-01  31  20
7   111 2016-08-01  31  15
8   111 2016-09-01  29  15
9   111 2016-10-01  31  10
10  111 2016-11-01  29  5
11  111 2016-12-01  27  0
0   112 2016-01-01  31  55
1   112 2016-02-01  26  45
2   112 2016-03-01  31  40
3   112 2016-04-01  30  35
4   112 2016-04-01  31  30
5   112 2016-05-01  30  25
6   112 2016-06-01  31  25
7   112 2016-07-01  31  20
8   112 2016-08-01  30  20
9   112 2016-09-01  31  15
10  112 2016-11-01  29  10
11  112 2016-12-01  31  0

I'm trying to make my table final table look like this below after grouping by ID and Date.

    ID   Date      CumDays  Volume/Day
0   111 2016-01-01  20       50
1   111 2016-02-01  45       40
2   111 2016-03-01  76       35
3   111 2016-04-01  106      30
4   111 2016-05-01  137      25
5   111 2016-06-01  167      20
6   111 2016-07-01  198      20
7   111 2016-08-01  229      15
8   111 2016-09-01  258      15
9   111 2016-10-01  289      10
10  111 2016-11-01  318      5
11  111 2016-12-01  345      0
0   112 2016-01-01  31       55
1   112 2016-02-01  57       45
2   112 2016-03-01  88       40
3   112 2016-04-01  118      35
4   112 2016-05-01  149      30
5   112 2016-06-01  179      25
6   112 2016-07-01  210      25
7   112 2016-08-01  241      20
8   112 2016-09-01  271      20
9   112 2016-10-01  302      15
10  112 2016-11-01  331      10
11  112 2016-12-01  362       0

Next, I want to be able to extract the first value of Volume/Day per ID, all the CumDays values and all the Volume/Day values per ID and Date. So I can use them for further computation and plotting Volume/Day vs CumDays. Example for ID:111, the first value of Volume/Day will be only 50 and ID:112, it will be only 55. All CumDays values for ID:111 will be 20,45... and ID:112, it will be 31,57...For all Volume/Day --- ID:111, will be 50, 40... and ID:112 will be 55,45...

My solution:

def get_time_rate(grp_df):
    t = grp_df['Days'].cumsum()
    r = grp_df['Volume/Day']
    return t,r

vals = df.groupby(['ID','Date']).apply(get_time_rate)
vals

Doing this, the cumulative calculation doesn't take effect at all. It returns the original Days value. This didn't allow me move further in extracting the first value of Volume/Day, all the CumDays values and all the Volume/Day values I need. Any advice or help on how to go about it will be appreciated. Thanks

Upvotes: 1

Views: 4548

Answers (1)

cs95
cs95

Reputation: 402353

Get a groupby object.

g = df.groupby('ID')

Compute columns with transform:

df['CumDays'] = g.Days.transform('cumsum')
df['First Volume/Day'] = g['Volume/Day'].transform('first')
df

     ID        Date  Days  Volume/Day  CumDays  First Volume/Day
0   111  2016-01-01    20          50       20                50
1   111  2016-02-01    25          40       45                50
2   111  2016-03-01    31          35       76                50
3   111  2016-04-01    30          30      106                50
4   111  2016-05-01    31          25      137                50
5   111  2016-06-01    30          20      167                50
6   111  2016-07-01    31          20      198                50
7   111  2016-08-01    31          15      229                50
8   111  2016-09-01    29          15      258                50
9   111  2016-10-01    31          10      289                50
10  111  2016-11-01    29           5      318                50
11  111  2016-12-01    27           0      345                50
0   112  2016-01-01    31          55       31                55
1   112  2016-01-02    26          45       57                55
2   112  2016-01-03    31          40       88                55
3   112  2016-01-04    30          35      118                55
4   112  2016-01-05    31          30      149                55
5   112  2016-01-06    30          25      179                55
6   112  2016-01-07    31          25      210                55
7   112  2016-01-08    31          20      241                55
8   112  2016-01-09    30          20      271                55
9   112  2016-01-10    31          15      302                55
10  112  2016-01-11    29          10      331                55
11  112  2016-01-12    31           0      362                55

If you want grouped plots, you can iterate over each groups after grouping by ID. To plot, first set index and call plot.

fig, ax = plt.subplots(figsize=(8,6))
for i, g in df2.groupby('ID'):
    g.plot(x='CumDays', y='Volume/Day', ax=ax, label=str(i))

plt.show()

enter image description here

Upvotes: 4

Related Questions