Python: data frame error from lambda expression

Question

I wrote the following code to normalize a few columns of a data frame:

import pandas as pd

train = pd.read_csv('test1.csv')
header = train.columns.values
print(train)
print(header)

inputs = header[0:3]
trainArr = train.as_matrix(inputs)

print(inputs)
trainArr[inputs] = trainArr[inputs].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))

Some inputs from the code are:

   v1  v2  v3  result
0  12  31  31       0
1  34  52   4       1
2  32   4   5       1
3   7  89   2       0
['v1' 'v2' 'v3' 'result']
['v1' 'v2' 'v3']

However, I got the following error:

    trainArr[inputs] = trainArr[inputs].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
IndexError: arrays used as indices must be of integer (or boolean) type

Does any one have any idea what I missed here? Thanks!

jezrael · Accepted Answer

I think you can first select first three column by [:3] and then create subset of DataFrame by train[header]. Last you can apply function for first 3 columns:

print (train)
   v1  v2  v3  result
0  12  31  31       0
1  34  52   4       1
2  32   4   5       1
3   7  89   2       0

header = train.columns[:3]
print(header)
Index([u'v1', u'v2', u'v3'], dtype='object')

print (train[header])
   v1  v2  v3
0  12  31  31
1  34  52   4
2  32   4   5
3   7  89   2

train[header] = train[header].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
print (train)
         v1        v2        v3  result
0 -0.342593 -0.152941  0.706897       0
1  0.472222  0.094118 -0.224138       1
2  0.398148 -0.470588 -0.189655       1
3 -0.527778  0.529412 -0.293103       0

But I think better is use iloc for selecting first 3 columns:

print (train.iloc[:,:3])
   v1  v2  v3
0  12  31  31
1  34  52   4
2  32   4   5
3   7  89   2

train.iloc[:,:3] = train.iloc[:,:3].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
print train
         v1        v2        v3  result
0 -0.342593 -0.152941  0.706897       0
1  0.472222  0.094118 -0.224138       1
2  0.398148 -0.470588 -0.189655       1
3 -0.527778  0.529412 -0.293103       0

Python: data frame error from lambda expression

Answers (1)

Related Questions