user3557054
user3557054

Reputation: 219

Regress categorical variables in Matlab

I have a cell type variable with 12 columns and 20000 rows. I call it Atotal:

Atotal= [ATY1;ATY2;ATY3;ATY4;ATY5;ATY6;ATY7;ATY8;ATY9;ATY10;ATY11;ATY12;ATY13;ATY14;ATY15;ATY16;ATY17];

Atotal={   972   1  0 0 0 0 0  21   60  118  60110  2001
           973   0  0 1 0 0 0  15   46  1496 60110  2001
           980   0  0 0 0 1 0  4    68  142  40502  2001
           994   1  0 0 0 0 0  13   33  86   81101  2001
           995   0  0 0 1 0 0  9    55  183  31201  2001
           1024  1  0 0 0 0 0  10   26  3    80803  2001}

I get my dependent and independent variables from there:

Y1=cell2mat(Atotal(:,2));
X1=cell2mat(Atotal(:,3));

And then I regress them. Considering that my dependent variable Y1 is binary and my independent variable X1 is also a categorical variable, I use the follwoing code, still not sure if it is the correct one.

mdl1 = fitlm(X1,Y1,'CategoricalVars',logical([1]));

Then I add more dummies and try the same code:

X2=cell2mat(Atotal(:,4));
X3=cell2mat(Atotal(:,5));
X4=cell2mat(Atotal(:,6));
X5=cell2mat(Atotal(:,7));

mdl2 = fitlm(X1,X2,X3,X4,X5,Y1,'CategoricalVars',logical([1,2,3,4,5]));

But now it gives me a lt of errors:

Error using internal.stats.parseArgs (line 42)
Parameter name must be text.

Error in LinearModel.fit (line 849)
            [intercept,predictorVars,responseVar,weights,exclude, ...

Error in fitlm (line 117)
model = LinearModel.fit(X,varargin{:});

Could someone help me? Thank you

Upvotes: 0

Views: 3020

Answers (1)

Martin Stancsics
Martin Stancsics

Reputation: 390

I think there are two problems with your code.

The first problem is that fitlm expects the following arguments:

mdl = fitlm(X,y,modelspec)

which basically means that you have to collect your predictor variables into one matrix, and use it as its first argument. So you should do the following:

X = [X1, X2, X3, X4, X5];
fitlm(X, Y1, ...)

The second problem is that for the CategoricalVars argument fitlm expects either a logical vector (a vector which is one where the variable is categorical, and zero where continuous) or a numeric index vector. So the correct usage is:

X = [X1, X2, X3, X4, X5];
fitlm(X, Y1, 'CategoricalVars',logical([1,1,1,1,1]))

or

X = [X1, X2, X3, X4, X5];
fitlm(X, Y1, 'CategoricalVars', [1,2,3,4,5])

The above code snippets should work properly.

However you could consider declaring your categorical variables as categorical (if you have Matlab R2013b or above). In this case you would do the following:

X1 = categorical(cell2mat(Atotal(:,3)));
X2 = categorical(cell2mat(Atotal(:,4)));
X3 = categorical(cell2mat(Atotal(:,5)));
X4 = categorical(cell2mat(Atotal(:,6)));
X5 = categorical(cell2mat(Atotal(:,7)));

X = [X1, X2, X3, X4, X5];
fitlm(X, Y1)

The advantage of this approach is that Matlab knows that your Xi variables are categorical, and they will be treated accordingly, so you do not have to specify the CategoricalVars argument every time you want to run a regression.

Finally, the Matlab documentation of the fitlm function is really good with a lot of examples, so check that out too.

Note: as others have mentioned in the comments, you should also consider running a logit regression as your response variable is binary. In this case you would estimate your model the following way:

X = [X1, X2, X3, X4, X5];
fitglm(X, Y1, 'Distribution', 'binomial', 'Link', 'logit')

However if you do this be sure to understand what a logistic model is, what are its assumptions and what is the interpretation of its coefficients.

Upvotes: 2

Related Questions