EmJ
EmJ

Reputation: 4608

How to quickly normalise data in pandas dataframe?

I have a pandas dataframe as follows.

import pandas as pd
df = pd.DataFrame({
               'A':[1,2,3],
               'B':[100,300,500],
               'C':list('abc')
             })
print(df)
   A    B  C
0  1  100  a
1  2  300  b
2  3  500  c

I want to normalise the entire dataframe. Since column C is not a numbered column what I do is as follows (i.e. remove C first, normalise data and add the column).

df_new = df.drop('concept', axis=1)
df_concept = df[['concept']]
from sklearn import preprocessing
x = df_new.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_new = pd.DataFrame(x_scaled)
df_new['concept'] = df_concept

However, I am sure that there is more easy way of doing this in pandas (given the column names that I do not need to normalise, then do the normalisation straightforward).

I am happy to provide more details if needed.

Upvotes: 1

Views: 123

Answers (2)

Alex
Alex

Reputation: 69

In case you want to apply any other functions on the data frame, you can use df[columns] = df[columns].apply(func).

Upvotes: 1

jezrael
jezrael

Reputation: 862511

Use DataFrame.select_dtypes for DataFrame with numeric columns and then normalize with division by minimal and maximal values and then assign back only normalized columns:

df1 = df.select_dtypes(np.number)
df[df1.columns]=(df1-df1.min())/(df1.max()-df1.min())
print (df)
     A    B  C
0  0.0  0.0  a
1  0.5  0.5  b
2  1.0  1.0  c

Upvotes: 1

Related Questions