Reputation: 220
I'm trying to create a multiple linear regression model with this data:
bweight gestwks hyp sex
1 2974 38.5200004577637 0 female
2 3270 NA 0 male
3 2620 38.150001525878899 0 female
4 3751 39.799999237060497 0 male
5 3200 38.889999389648402 1 male
6 3673 40.970001220703097 0 female
In order to consider the string type arguments "male" and "female", I convert them to integers 1 and 0, like this :
male = 1*(sex == "male")
So, creating the linear model, where babyweight is the outcome variable:
lm2 = lm(bweight ~ gestwks + hyp + male)
But then when I'd like to see the parameters of the model, I get this(not the whole output is included here):
Call:
lm(formula = bweight ~ gestwks +
hyp + male)
Coefficients:
(Intercept) gestwks26.950000762939499
864.000 -236.000
gestwks27.329999923706101 gestwks27.9899997711182
7.363 146.469
gestwks28.040000915527301 gestwks30.5200004577637
184.469 760.469
gestwks30.649999618530298 gestwks30.709999084472699
900.000 -141.531
And I'm supposed to be getting only one pair of parameters. What am I doing wrong?
Upvotes: 1
Views: 833
Reputation: 19394
Before conducting any analysis, always explore your variables carefully. Pay attention to ranges and distributions for continuous variables and frequencies for categorical ones. Do this after importing the data.
In this case, the gestwks
variable is not actually numeric. If you had looked at the output of str(my_data)
, where my_data
is the name of your data frame, then you would have seen the potential problem with that variable. You probably need to revise the command to import the data. If it is correct, then you'll need to convert the variable into a numeric one using the appropriate command. Read the Warning in the help page of as.numeric
.*
Data management is a key part of your analysis.
Look carefully at gestwks
for strange looking values. table
can help if there aren't too many records, or look at the first and last few sorted values.
*
as.numeric (levels (f))[f]
or as.numeric (as.character (f))
is the recommended command.
Upvotes: 2
Reputation: 59
gestwks is a factor, you need to convert it with as.numeric
before you regress on it.
Upvotes: 0