Reputation: 25366
I wrote the following code to normalize a few columns of a data frame:
import pandas as pd
train = pd.read_csv('test1.csv')
header = train.columns.values
print(train)
print(header)
inputs = header[0:3]
trainArr = train.as_matrix(inputs)
print(inputs)
trainArr[inputs] = trainArr[inputs].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
Some inputs from the code are:
v1 v2 v3 result
0 12 31 31 0
1 34 52 4 1
2 32 4 5 1
3 7 89 2 0
['v1' 'v2' 'v3' 'result']
['v1' 'v2' 'v3']
However, I got the following error:
trainArr[inputs] = trainArr[inputs].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
IndexError: arrays used as indices must be of integer (or boolean) type
Does any one have any idea what I missed here? Thanks!
Upvotes: 1
Views: 81
Reputation: 862761
I think you can first select first three column by [:3]
and then create subset of DataFrame
by train[header]
. Last you can apply
function for first 3 columns:
print (train)
v1 v2 v3 result
0 12 31 31 0
1 34 52 4 1
2 32 4 5 1
3 7 89 2 0
header = train.columns[:3]
print(header)
Index([u'v1', u'v2', u'v3'], dtype='object')
print (train[header])
v1 v2 v3
0 12 31 31
1 34 52 4
2 32 4 5
3 7 89 2
train[header] = train[header].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
print (train)
v1 v2 v3 result
0 -0.342593 -0.152941 0.706897 0
1 0.472222 0.094118 -0.224138 1
2 0.398148 -0.470588 -0.189655 1
3 -0.527778 0.529412 -0.293103 0
But I think better is use iloc
for selecting first 3 columns:
print (train.iloc[:,:3])
v1 v2 v3
0 12 31 31
1 34 52 4
2 32 4 5
3 7 89 2
train.iloc[:,:3] = train.iloc[:,:3].apply(lambda x: (x - x.mean()) / (x.max() - x.min()))
print train
v1 v2 v3 result
0 -0.342593 -0.152941 0.706897 0
1 0.472222 0.094118 -0.224138 1
2 0.398148 -0.470588 -0.189655 1
3 -0.527778 0.529412 -0.293103 0
Upvotes: 1