daiyue
daiyue

Reputation: 7448

pandas calculates column value means on groups and means across whole dataframe

I have a df, df['period'] = (df['date1'] - df['date2']) / np.timedelta64(1, 'D')

code    y_m        date1        date2         period    
1000    201701    2017-12-10   2017-12-09       1
1000    201701    2017-12-14   2017-12-12       2
1000    201702    2017-12-15   2017-12-13       2
1000    201702    2017-12-17   2017-12-15       2
2000    201701    2017-12-19   2017-12-18       1
2000    201701    2017-12-12   2017-12-10       2
2000    201702    2017-12-11   2017-12-10       1
2000    201702    2017-12-13   2017-12-12       1
2000    201702    2017-12-11   2017-12-10       1

then groupby code and y_m to calculate the average of date1-date2,

df_avg_period = df.groupby(['code', 'y_m'])['period'].mean().reset_index(name='avg_period')

code        y_m        avg_period
1000        201701     1.5
1000        201702     2
2000        201701     1.5
2000        201702     1

but I like to convert df_avg_period into a matrix that transposes column code to rows and y_m to columns, like

      0     1     2             3              
 0   -1     0    201701       201702       
 1   0     1.44  1.44          1.4         
 2   1000  1.75  1.5           2     
 3   2000  1.20  1.5           1

-1 represents a dummy value that indicates either a value doesn't exist for a specific code/y_m cell or to maintain matrix shape; 0 represents 'all' values, that averages the code or y_m or code and y_m, e.g. cell (1,1) averages the period values for all rows in df; (1,2) averages the period for 201701 across all rows that have this value for y_m in df.

apparently pivot_table cannot give correct results using mean. so I am wondering how to achieve that correctly?

Upvotes: 1

Views: 65

Answers (1)

cs95
cs95

Reputation: 402423

pivot_table with margins=True

piv = df.pivot_table(
    index='code', columns='y_m', values='period', aggfunc='mean', margins=True
)
# housekeeping
(piv.reset_index()
    .rename_axis(None, 1)
    .rename({'code' : -1, 'All' : 0}, axis=1)
    .sort_index(axis=1)
)

  -1         0        201701   201702
0    1000  1.750000      1.5      2.0
1    2000  1.200000      1.5      1.0
2     All  1.444444      1.5      1.4

Upvotes: 1

Related Questions