tj judge
tj judge

Reputation: 616

Calculate slope based on axis in rows

I have a dataframe:

  Name  Segment  Axis  1   2    3   4   5
  Amazon  1       x    0   1    2   3   4
                  y    0   0.4  0.8 1.2 1.6
  Amazon  2       x    0   2    4   6   8
                  y    0   1    2   3   4  

Is there a creative way to add a 3rd row to each Name-Segment axis calculating the slope at each point?

Slope calculated as: change in y/change in x

Expected output:

     Name  Segment  Axis  1   2    3   4   5
     Amazon  1       x    0   1    2   3   4
                     y    0   0.4  0.8 1.2 1.6
                    Slope
     Amazon  2       x    0   2    4   6   8
                     y    0   1    2   3   4  
                    Slope

I realize this might be a tricky question, just looking for some ideas here at the very least.

Upvotes: 0

Views: 139

Answers (1)

Cimbali
Cimbali

Reputation: 11395

(I’m assuming your 3 first columns are actually index, as they seem to be displayed. If not, you first need to do df.set_index(['Name', 'Segment', 'Axis']))

You can easily access each coordinate with .xs − here we’re basically saying « access the data labeled 'x' in the index level Axis) »:

>>> df
                     1    2    3    4    5
Name   Segment Axis                       
Amazon 1       x     0  1.0  2.0  3.0  4.0
               y     0  0.4  0.8  1.2  1.6
       2       x     0  2.0  4.0  6.0  8.0
               y     0  1.0  2.0  3.0  4.0
>>> df.xs('x', 'index', 'Axis')
                1    2    3    4    5
Name   Segment                       
Amazon 1        0  1.0  2.0  3.0  4.0
       2        0  2.0  4.0  6.0  8.0

Then the change in each coordinate can be computed with .diff(), and we just divide to get the slopes:

>>> slopes = df.xs('x', 'index', 'Axis').diff(axis='columns') / df.xs('y', 'index', 'Axis').diff(axis='columns')
>>> slopes
                 1    2    3    4    5
Name   Segment                        
Amazon 1       NaN  2.5  2.5  2.5  2.5
       2       NaN  2.0  2.0  2.0  2.0

(of course the first value is NaN as there are no differences with element -1, if however you want to compute the difference with the next element use .diff(axis='columns', periods=-1))

Then, this is not really elegant, but you need to add back the 3rd level of index with pd.concat, and reorder the index levels as in the original index:

>>> slopes = pd.concat({'slope': slopes}, names=['Axis']).reorder_levels(df.index.names)
>>> slopes
                       1    2    3    4    5
Name   Segment Axis                         
Amazon 1       slope NaN  2.5  2.5  2.5  2.5
       2       slope NaN  2.0  2.0  2.0  2.0

Then you’re pretty much done:

>>> df.append(slopes).sort_index()
                        1    2    3    4    5
Name   Segment Axis                          
Amazon 1       slope  NaN  2.5  2.5  2.5  2.5
               x      0.0  1.0  2.0  3.0  4.0
               y      0.0  0.4  0.8  1.2  1.6
       2       slope  NaN  2.0  2.0  2.0  2.0
               x      0.0  2.0  4.0  6.0  8.0
               y      0.0  1.0  2.0  3.0  4.0

Upvotes: 1

Related Questions