Huzo
Huzo

Reputation: 1692

Confused about the usage of .apply and lambda

After encountering this code: enter image description here

I was confused about the usage of both .apply and lambda. Firstly does .apply apply the desired change to all elements in all the columns specified or each column one by one? Secondly, does x in lambda x: iterate through every element in specified columns or columns separately? Thirdly, does x.min or x.max give us the minimum or maximum of all the elements in specified columns or minimum and maximum elements of each column separately? Any answer explaining the whole process would make me more than grateful.
Thanks.

Upvotes: 2

Views: 1983

Answers (2)

jezrael
jezrael

Reputation: 863166

I think here is the best avoid apply - loops under the hood and working with subset of DataFrame by columns from list:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df)

c = ['B','C','D']

So first select minimal values of selected columns and similar maximal:

print (df[c].min())
B    4
C    2
D    0
dtype: int64

Then subtract and divide:

print ((df[c] - df[c].min()))
   B  C  D
0  0  5  1
1  1  6  3
2  0  7  5
3  1  2  7
4  1  0  1
5  0  1  0

print (df[c].max() - df[c].min())
B    1
C    7
D    7
dtype: int64

df[c] = (df[c] - df[c].min()) / (df[c].max() - df[c].min())
print (df)
   A    B         C         D  E  F
0  a  0.0  0.714286  0.142857  5  a
1  b  1.0  0.857143  0.428571  3  a
2  c  0.0  1.000000  0.714286  6  a
3  d  1.0  0.285714  1.000000  9  b
4  e  1.0  0.000000  0.142857  2  b
5  f  0.0  0.142857  0.000000  4  b

EDIT:

For debug apply is best create custom function:

def f(x):
    #for each loop return column
    print (x)
    #return scalar - min
    print (x.min())
    #return new Series - column
    print ((x-x.min())/ (x.max() - x.min()))
    return (x-x.min())/ (x.max() - x.min())

df[c] = df[c].apply(f)
print (df)

Upvotes: 1

glycoaddict
glycoaddict

Reputation: 929

Check if the data are really being normalised. Because x.min and x.max may simply take the min and max of a single value, hence no normalisation would occur.

Upvotes: 1

Related Questions