Reputation: 969
I have a dataset consisting of 106 individuals of two types - a and b with various variables, for example age and gender. I want to run a linear model which predicts whether each individual is of type a or type b based on the co-variates.
I read in the values for age, gender and the type label for each individual using:
`data = read.xlsx("spreadsheet.xlsx",2, as.is = TRUE)`
age = data$age
gender = data$gender
type = data$type
where each is of the form:
age = [28, 30, 19, 23 etc]
gender = [male, male, female, male etc]
type = [a b b b]
Then I try to set up the model using:
model1 = lm(type ~ age + gender)
but I get this error message:
Warning messages:
1: In model.response(mf, "numeric") :
using type="numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : - not meaningful for factors
I've tried changing the format of type, age and gender using:
age = as.numeric(as.character(age))
gender = as.character(gender)
type = as.character(type)
But this doesn't work!
Upvotes: 12
Views: 47546
Reputation: 3554
You can't use a linear regression model with a factor as your response variable, which is what you are attempting to do here (type is your response variable). Regression models require numeric response variables. You should instead look at classification models.
As Roland points out, you may wish to start by restating your "type" variable as a logical, binomial variable. Rather than a factor called "type" with two levels "a" and "b", you might create a new variable called "is.type.a", which would contain TRUE or FALSE.
You could then try a logistic regression based on a binomial distribution
model <- glm(is.type.a ~ age + gender,data=data,family="binomial")
Upvotes: 27