Mainland
Mainland

Reputation: 4564

Python dataframe find difference between min and max in a row

I have a dataframe of many columns as given below

df = 

index  P1   Q1   W1  P2   Q2   W2  P3   Q3   W3 
0      1    -1    2  3    0   -4   -4   4    0  
1      2    -5    8  9    3   -7   -8   9    6 
2     -4    -5    3  4    5   -6   -7   8    8 

I want to compute row wise difference between max and min in P columns.

df['P_dif'] =  max (P1,P2,P3) - min (P1,P2,P3) 

My expected output

df = 

index  P1   Q1   W1  P2   Q2   W2  P3   Q3   W3    P_dif
0      1    -1    2  3    0   -4   -4   4    0      7   # 3-(-4)
1      2    -5    8  9    3   -7   -8   9    6      17  # 9-(-8)      
2     -4    -5    3  4    5   -6   -7   8    8      11  # 4-(-7) 

My present code

df['P_dif'] = df[df.columns[::3]].apply(lambda g: g.max()-g.min())

My present output

print(df['P_dif'])
NaN
NaN
NaN

Upvotes: 0

Views: 1183

Answers (2)

turong
turong

Reputation: 1622

You can use DataFrame.max, DataFrame.min with axis=1 to calculate max and min value among columns

computed_cols = df.loc[:, ['P1', 'P2', 'P3']]
df['P_dif'] = computed_cols.max(axis=1) - computed_cols.min(axis=1)

Best,

Upvotes: 1

paxdiablo
paxdiablo

Reputation: 881363

Not sure why you're getting Nan values but I suspect it may be because you have rows with NaN in the Px columns (in the rows you hven't shown us in your example).

The reason I suspect this is because the lambda you're applying is operating on columns rather than rows, as per the following transcript:

>>> import pandas
>>> data = [[1,-1,2,3,0,-4,-4,4,0],[2,-5,8,9,3,-7,-8,9,6],[-4,-5,3,4,5,-6,-7,8,8]]
>>> df=pandas.DataFrame(data,columns=['P1','Q1','W1','P2','Q2','W2','P3','Q3','W3'])
>>> df
   P1  Q1  W1  P2  Q2  W2  P3  Q3  W3
0   1  -1   2   3   0  -4  -4   4   0
1   2  -5   8   9   3  -7  -8   9   6
2  -4  -5   3   4   5  -6  -7   8   8
>>> df[df.columns[::3]].apply(lambda g: g.max()-g.min())
P1    6 #  2 - -4 -> 6
P2    6 #  9 -  3 -> 6
P3    4 # -4 - -8 -> 4

Note the output specifying the P1, P2 and P3 values and the stuff I've added as comments to the right, to show that it's the maximal difference of the column rather than the row.


You can get the information you need with the following:

>>> numpy.ptp(numpy.array(df[['P1', 'P2', 'P3']]), axis=1)
array([7, 17, 11], dtype=int64)

I don't doubt someone more familar than I with Pandas and Numpy could improve on that so feel free to edit this answer if that's the case.

Upvotes: 2

Related Questions