jbastos
jbastos

Reputation: 1217

Python Pandas: transforming - moving values from diagonal

given the following DataFrame, grouped with:

    dataset = z.groupby(
        ['app', 'regmonth', 'loginsmonth']).sum().unstack().fillna(
            0, inplace=False)


                             cnt                                      
loginsmonth           2014-02-01  2014-03-01  2014-04-01  2014-05-01   
app       regmonth                                                     
1         2014-02-01        6069        1837         107          54   
          2014-03-01           0       10742        2709        1394   
          2014-04-01           0           0        5584        1107   
          2014-05-01           0           0           0        3044   
          2014-06-01           0           0           0           0   

I would like to transform it to:

                             cnt                                      
loginsmonth           2014-02-01  2014-03-01  2014-04-01  2014-05-01   
app       regmonth                                                     
1         2014-02-01        6069        1837         107          54   
          2014-03-01       10742        2709        1394           0   
          2014-04-01        5584        1107           0           0   
          2014-05-01        3044           0           0           0
          2014-06-01           0           0           0           0   

So, it moves the diagonal to the beginning of the line and fills the void with zeros. Does panda have any easy way of doing it?

Upvotes: 6

Views: 520

Answers (1)

CT Zhu
CT Zhu

Reputation: 54330

But you are changing your data in the process, right?

I don't know if pandas has a nice way to do it, but np.diagnoal can do what you want here:

In [96]:

print df
loginsmonth     2014-02-01  2014-03-01  2014-04-01  2014-05-01
app regmonth                                                  
1   2014-02-01        6069        1837         107          54
    2014-03-01           0       10742        2709        1394
    2014-04-01           0           0        5584        1107
    2014-05-01           0           0           0        3044
    2014-06-01           0           0           0           0

[5 rows x 4 columns]
In [124]:

print df*0+np.asarray([np.hstack((np.diagonal(df.values, i), np.zeros(i+1, int))) 
                       for i in range(df.shape[1])]).T
loginsmonth     2014-02-01  2014-03-01  2014-04-01  2014-05-01
app regmonth                                                  
1   2014-02-01        6069        1837         107          54
    2014-03-01       10742        2709        1394           0
    2014-04-01        5584        1107           0           0
    2014-05-01        3044           0           0           0
    2014-06-01           0           0           0           0

[5 rows x 4 columns]

here the 1 in np.zeros(i+1, int) is df.shape[0]-df.shape[1]. I don't know whether you will have situation where df.shape[0]<df.shape[1].

But if your DataFrame always looks like the one shown here, which is like a upper diagonal matrix without 0s inside, you can just take a short cut:

In [134]:

print df.apply(lambda x: sorted(x, key=lambda y: y==0), axis=1)
cnt             2014-02-01  2014-03-01  2014-04-01  2014-05-01
app regmonth                                                  
1   2014-02-01        6069        1837         107          54
    2014-03-01       10742        2709        1394           0
    2014-04-01        5584        1107           0           0
    2014-05-01        3044           0           0           0
    2014-06-01           0           0           0           0

[5 rows x 4 columns] 

Upvotes: 4

Related Questions