Edamame
Edamame

Reputation: 25376

Ignore string columns while doing

I am using the following code to normalize a pandas DataFrame:

df_norm = (df - df.mean()) / (df.max() - df.min())

This works fine when all columns are numeric. However, now I have some string columns in df and the above normalization got errors. Is there a way to perform such normalization only on numeric columns of a data frame (keeping string column unchanged)?

Upvotes: 7

Views: 6737

Answers (1)

LateCoder
LateCoder

Reputation: 2283

You can use select_dtypes to calculate value for the desired columns:

df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c'], 'c': [4, 5, 6]})

df

   a  b  c
0  1  a  4
1  2  b  5
2  3  c  6

df_num = df.select_dtypes(include='number')

df_num

   a  c
0  1  4
1  2  5
2  3  6

And then you can assign them back to the original df:

df_norm = (df_num - df_num.mean()) / (df_num.max() - df_num.min())


df[df_norm.columns] = df_norm

df

     a  b    c
0 -0.5  a -0.5
1  0.0  b  0.0
2  0.5  c  0.5

Upvotes: 14

Related Questions