Mc Kevin
Mc Kevin

Reputation: 972

Linear regression on variables that does not scale directly with the output

I've been trying to follow a machine learning course on coursera. So far, most of the linear regression models introduced use variables that their numerical values have a positive correlation with the output.

Input: square feet of the house 
Output: house price.

I'm however, trying to implement a multivariate regression model with some of the variables those numerical value that is not directly proportional to the output.

Inputs: 
-what day is it (Mon,Tues..), 
-what holiday is it (NewYear,Xmas..), 
-what month is it(Jan,Feb), 
-what time is it(0100,1300..)

Output: 
-Number of visitors.

Questions:

  1. For the variables: what day is it, what holiday is it, what month is it, I am using an enumeration and assign a value for each value. (NewYear =1, Christmas =2, etc.). Is it better to do it this way or have separate variables? (IsNewYear, IsChristmas, etc.)

  2. I understand that by applying higher orders of power in a variable, it can have a better fit, which is what I want for the holidays variable. Are there any methods that I can use to let the computer learn the best order by itself?

  3. Are there any existing C# libraries that I can use that allows different orders of power for different variable? (e.g. 13 for holidays and quadratic for the time of the day)

Thanks.

Upvotes: 1

Views: 104

Answers (1)

lejlot
lejlot

Reputation: 66815

For the variables: what day is it, what holiday is it, what month is it, I am using an enumeration and assign a value for each value. (NewYear =1, Christmas =2, etc.). Is it better to do it this way or have separate variables? (IsNewYear, IsChristmas, etc.)

Yes, you should never encode any order inside a variable which does not follow arithmetics, thus NewYear=1, Christmas=2, Thanksgiving=3 would mean that Christmas=(Thanksgiving+NewYear) / 2... now something you would like to have. One hot encoding (isNewyear etc.) is favorable so you do not encode false knowledge.

I understand that by applying higher orders of power in a variable, it can have a better fit, which is what I want for the holidays variable. Are there any methods that I can use to let the computer learn the best order by itself?

This is what non-linear methods do. Kernel methods (kernelized linear regression, SVR), neural networks, regression trees/forests etc.

Are there any existing C# libraries that I can use that allows different orders of power for different variable? (e.g. 13 for holidays and quadratic for the time of the day)

You should not think about this in such terms, you are not supposed to fit powers by hand, you should rather give a model flexibility to fit high orders by themselves (see previous point).

Upvotes: 2

Related Questions