tj judge
tj judge

Reputation: 616

Extract data based on various conditions

I have a dataframe:

    Name    Segment   Axis    1      2      3      4      5
0  Amazon        1  slope  NaN     100     120    127    140
1  Amazon        1      x  0.0     1.0     2.0    3.0    4.0
2  Amazon        1      y  0.0     0.4     0.8    1.2    1.6
3  Amazon        2  slope  NaN     50      57     58     59
4  Amazon        2      x  0.0     2.0     4.0    6.0    8.0
5  Amazon        2      y  0.0     1.0     2.0    3.0    4.0

df2:

  Name  Segment  Optimal Cost
  Amazon   1      115
  Amazon   2      60
  Netflix  1      100
  Netflix  2      110

I am trying to compare the slope values in the axis column to the corresponding optimal cost values and extract the slope, x and y values.

The rule is: Find the last first slope value greater than its corresponding optimal cost

If there is no value greater than optimal cost, then report where slope is zero.

If there are only values greater than optimal cost, then report highest y value

Expected output:

      Name    Segment   slope   x    y     
 0  Amazon        1      120    2   0.8
 1  Amazon        2      NaN    0   0

With help of @wwnde . Current code is

s=df.set_index(['Name' , 'Segment','Axis']).stack().unstack('Axis')
s=s.dropna(subset=["slope"]).sort_values("slope").reset_index(level=2, drop=True)
df3=pd.merge(s, df2, on=['Name',  'Segment'], how='left')
df3[df3['slope']>df3['Optimal_Cost']].groupby(['Name', 
'Segment']).first().reset_index()

But this only returns values if it finds the last first slope value greater than its corresponding optimal cost.

Upvotes: 0

Views: 67

Answers (1)

Anurag Dabas
Anurag Dabas

Reputation: 24322

Let's continue with @wwnde solution with some changes in it:

s=df.set_index(['Name','Segment','Axis']).stack().unstack(2)
s=s.sort_values("slope").reset_index(level=2, drop=True) 
#In above code we don't have to drop nan
out=pd.merge(s, df2, on=['Name',  'Segment'], how='left')
cond=out['slope'].gt(out['Optimal Cost']) | out['slope'].isna()
#make changes in condition to include nan's
out=out[cond].groupby(['Name','Segment'],as_index=False).first().drop('Optimal Cost',1)

Output of out:

    Name    Segment     slope   x       y
0   Amazon  1           120.0   2.0     0.8
1   Amazon  2           NaN     0.0     0.0 

Upvotes: 3

Related Questions