snapo
snapo

Reputation: 704

Python pandas dataframe normalize each row with only row information not column max min

My dataframe contains currently the following design

Source:

index   col1 col2   col3
row1    100    50   0
row2    -100   50   -25
row3    0       0   0
row4    -1     -1   -1
row5    1       1   1
row6    -100    0   1

My Target is

index   col1    col2    col3
row1    1.0 0.5 0.0
row2    0   1   0.5
row3    0   0   0
row4    0   0   0
row5    0   0   0
row6    0   0.99    1

What i did try from Stackoverflow answers:

Normalizes Column max instead of row max/min

df = (df.T / df.T.sum()).T

Normalizes Column max instead of row max/min

df = df.div(df.sum(axis=1), axis=0)

Normalizes Column max instead of row max/min

df.iloc[:,:] = Normalizer(norm='l2').fit_transform(df)

i did try to change: df.div(df.sum(axis=1), axis=0) and play with the axis, unfortunately as soon as i change any axis it throws an error.

From reading on the pandas dataframe built in functions i cant see anything pythonic and easy how i achive it without complicated lambda functions on a apply with storing the min max values before on each row. Pandas also says that we should not iterate over rows and change values :-( so i am a bit lost and appreciate some input.

Upvotes: 1

Views: 2019

Answers (1)

DYZ
DYZ

Reputation: 57033

  1. Subtract the smallest element from each row.
  2. Divide the row by its range (the difference between the max and the min).
  3. If the range is 0, the division produces NaNs. Fill them with the original values.

Code:

df.subtract(df.min(axis=1), axis=0)\
  .divide(df.max(axis=1) - df.min(axis=1), axis=0)\
  .combine_first(df)
#       col1      col2  col3
#row1    1.0  0.500000   0.0
#row2    0.0  1.000000   0.5
#row3    0.0  0.000000   0.0
#row4   -1.0 -1.000000  -1.0
#row5    1.0  1.000000   1.0
#row6    0.0  0.990099   1.0

Upvotes: 3

Related Questions