vhd
vhd

Reputation: 2068

How to normalise dataset for linear/multi regression in python

I am using a data-set to make some predictions using the multi-variable regression techniques. I have to predict the salary of the employees based on some independent variables like gender, percentage, date of birth, marks in different subjects, degree, specialization etc.

Numeric parameters(eg- marks and percentage in different subjects) are fine to be used with the regression model. But how do we normalize the non-numeric parameters (gender, date of birth, degree, specialization) here ?

P.S. : I am using the scikit-learn : machine learning in python package.

Upvotes: 3

Views: 1240

Answers (3)

Dave
Dave

Reputation: 680

You want to encode your categorical parameters.

Note that date is not a categorical parameter! Convert it into a unix timestamp (seconds since epoch) and you have a nice parameter on which you can regress.

Upvotes: 1

sparc_spread
sparc_spread

Reputation: 10833

"Normaliz[ing] non-numeric parameters" is actually a huge area of regression. The most common treatment is to turn each categorical into a set of binary variables called dummy variables.

Each categorical with n values should be converted into n-1 dummy variables. So for example, for gender, you might have one variable, "female", that would be either 0 or 1 at each observation. Why n-1 and not n? Because you want to avoid the dummy variable trap, where basically the intercept column of all 1's can be reconstructed from a linear combination of your dummy columns. In relatively non-technical terms, that's bad because it messes up the linear algebra needed to do the regression.

I am not so familiar with the scikit-learn library but I urge you to make sure that whatever methods you do use, you ensure that each categorical becomes n-1 new columns, and not n.

Upvotes: 0

Muhammad Noman
Muhammad Noman

Reputation: 1366

I hope this can help you. The whole description of how to use that function is available on this link.

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html

Upvotes: 0

Related Questions