Reputation: 1692
I was confused about the usage of both .apply
and lambda
. Firstly does .apply
apply the desired change to all elements in all the columns specified or each column one by one? Secondly, does x in lambda x:
iterate through every element in specified columns or columns separately? Thirdly, does x.min
or x.max
give us the minimum or maximum of all the elements in specified columns or minimum and maximum elements of each column separately? Any answer explaining the whole process would make me more than grateful.
Thanks.
Upvotes: 2
Views: 1983
Reputation: 863166
I think here is the best avoid apply
- loops under the hood and working with subset of DataFrame
by columns from list
:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
c = ['B','C','D']
So first select minimal values of selected columns and similar maximal:
print (df[c].min())
B 4
C 2
D 0
dtype: int64
Then subtract and divide:
print ((df[c] - df[c].min()))
B C D
0 0 5 1
1 1 6 3
2 0 7 5
3 1 2 7
4 1 0 1
5 0 1 0
print (df[c].max() - df[c].min())
B 1
C 7
D 7
dtype: int64
df[c] = (df[c] - df[c].min()) / (df[c].max() - df[c].min())
print (df)
A B C D E F
0 a 0.0 0.714286 0.142857 5 a
1 b 1.0 0.857143 0.428571 3 a
2 c 0.0 1.000000 0.714286 6 a
3 d 1.0 0.285714 1.000000 9 b
4 e 1.0 0.000000 0.142857 2 b
5 f 0.0 0.142857 0.000000 4 b
EDIT:
For debug apply
is best create custom function:
def f(x):
#for each loop return column
print (x)
#return scalar - min
print (x.min())
#return new Series - column
print ((x-x.min())/ (x.max() - x.min()))
return (x-x.min())/ (x.max() - x.min())
df[c] = df[c].apply(f)
print (df)
Upvotes: 1
Reputation: 929
Check if the data are really being normalised. Because x.min and x.max may simply take the min and max of a single value, hence no normalisation would occur.
Upvotes: 1