Reputation: 219
I have a cell type variable with 12 columns and 20000 rows. I call it Atotal:
Atotal= [ATY1;ATY2;ATY3;ATY4;ATY5;ATY6;ATY7;ATY8;ATY9;ATY10;ATY11;ATY12;ATY13;ATY14;ATY15;ATY16;ATY17];
Atotal={ 972 1 0 0 0 0 0 21 60 118 60110 2001
973 0 0 1 0 0 0 15 46 1496 60110 2001
980 0 0 0 0 1 0 4 68 142 40502 2001
994 1 0 0 0 0 0 13 33 86 81101 2001
995 0 0 0 1 0 0 9 55 183 31201 2001
1024 1 0 0 0 0 0 10 26 3 80803 2001}
I get my dependent and independent variables from there:
Y1=cell2mat(Atotal(:,2));
X1=cell2mat(Atotal(:,3));
And then I regress them. Considering that my dependent variable Y1 is binary and my independent variable X1 is also a categorical variable, I use the follwoing code, still not sure if it is the correct one.
mdl1 = fitlm(X1,Y1,'CategoricalVars',logical([1]));
Then I add more dummies and try the same code:
X2=cell2mat(Atotal(:,4));
X3=cell2mat(Atotal(:,5));
X4=cell2mat(Atotal(:,6));
X5=cell2mat(Atotal(:,7));
mdl2 = fitlm(X1,X2,X3,X4,X5,Y1,'CategoricalVars',logical([1,2,3,4,5]));
But now it gives me a lt of errors:
Error using internal.stats.parseArgs (line 42)
Parameter name must be text.
Error in LinearModel.fit (line 849)
[intercept,predictorVars,responseVar,weights,exclude, ...
Error in fitlm (line 117)
model = LinearModel.fit(X,varargin{:});
Could someone help me? Thank you
Upvotes: 0
Views: 3020
Reputation: 390
I think there are two problems with your code.
The first problem is that fitlm expects the following arguments:
mdl = fitlm(X,y,modelspec)
which basically means that you have to collect your predictor variables into one matrix, and use it as its first argument. So you should do the following:
X = [X1, X2, X3, X4, X5];
fitlm(X, Y1, ...)
The second problem is that for the CategoricalVars
argument fitlm
expects either a logical vector (a vector which is one where the variable is categorical, and zero where continuous) or a numeric index vector. So the correct usage is:
X = [X1, X2, X3, X4, X5];
fitlm(X, Y1, 'CategoricalVars',logical([1,1,1,1,1]))
or
X = [X1, X2, X3, X4, X5];
fitlm(X, Y1, 'CategoricalVars', [1,2,3,4,5])
The above code snippets should work properly.
However you could consider declaring your categorical variables as categorical (if you have Matlab R2013b or above). In this case you would do the following:
X1 = categorical(cell2mat(Atotal(:,3)));
X2 = categorical(cell2mat(Atotal(:,4)));
X3 = categorical(cell2mat(Atotal(:,5)));
X4 = categorical(cell2mat(Atotal(:,6)));
X5 = categorical(cell2mat(Atotal(:,7)));
X = [X1, X2, X3, X4, X5];
fitlm(X, Y1)
The advantage of this approach is that Matlab knows that your Xi
variables are categorical, and they will be treated accordingly, so you do not have to specify the CategoricalVars
argument every time you want to run a regression.
Finally, the Matlab documentation of the fitlm
function is really good with a lot of examples, so check that out too.
Note: as others have mentioned in the comments, you should also consider running a logit regression as your response variable is binary. In this case you would estimate your model the following way:
X = [X1, X2, X3, X4, X5];
fitglm(X, Y1, 'Distribution', 'binomial', 'Link', 'logit')
However if you do this be sure to understand what a logistic model is, what are its assumptions and what is the interpretation of its coefficients.
Upvotes: 2