Reputation: 71
My question is rather simple, but I could not get it resolved after trying a lot of things.
I have two data frames.
>a
col1 col2 col3 col4
1 1 2 1 4
2 2 NA 2 3
3 3 2 3 2
4 4 3 4 1
> b
col1 col2 col3 col4
1 5 2 1 4
2 2 NA 2 3
3 3 NA 3 2
4 4 3 4 1
Can I do a lm(a ~ b)
to fit the data in a
and b
?
If I do, how do I ignore the NA
data?
Thanks, Dan
Upvotes: 2
Views: 16576
Reputation: 94182
If a and b are data frames, and you want to regress the individual values in a on the values in b, then you need to convert them to vectors. eg:
> lm(as.vector(as.matrix(a))~as.vector(as.matrix(b)))
Call:
lm(formula = as.vector(as.matrix(a)) ~ as.vector(as.matrix(b)))
Coefficients:
(Intercept) as.vector(as.matrix(b))
8.418239 -0.005241
Missing data is by default dropped - see help(lm) and the na.action parameter. The summary method on an lm object will tell you about dropped observations.
Of course ignoring the spatial correlation likely to be present in spatial data will mean your inferences from the parameter estimates will be quite wrong. Map the residuals. And read a good book on spatial stats...
[Edit: oh, and the data frames have to be all numbers or the whole lot gets converted to characters and then... well, who knows...]
Edit:
Another way of getting vectors from data frames is just to use 'unlist':
> a=data.frame(matrix(runif(16),4,4))
> b=data.frame(matrix(runif(16),4,4))
> lm(a~b)
Error in model.frame.default(formula = a ~ b, drop.unused.levels = TRUE) :
invalid type (list) for variable 'a'
> lm(unlist(a)~unlist(b))
Call:
lm(formula = unlist(a) ~ unlist(b))
Coefficients:
(Intercept) unlist(b)
0.6488 -0.3137
I've not seen data.matrix before, thx Gavin.
Upvotes: 2
Reputation: 263332
Generally the regression functions in R will only report the results from complete cases, so you do not usually need to do anything special to hold out cases. Your question seems a bit vague, and it is not clear why you are putting an entire matrix (or is that a data.frame?) on the left-hand side of a formula. There is the capability of doing multi-variate analyses with the lm() function, but people who want to do so will generally ask more specific questions.
> lm(a$col1 ~ b$col1+b$col2 +b$col3+b$col4)
Call:
lm(formula = a$col1 ~ b$col1 + b$col2 + b$col3 + b$col4)
Coefficients:
(Intercept) b$col1 b$col2 b$col3 b$col4
16 -3 NA NA NA
The tiny amount of data prevents any further estimates after losing 2 cases and only having two left.
Upvotes: 4