Reputation: 31
I'm trying to run a regression for every zipcode in my dataset and save the coefficients to a data frame but I'm having trouble.
Whenever I run the code below, I get a data frame called "coefficients" containing every zip code but with the intercept and coefficient for every zipcode being equal to the results of the simple regression lm(Sealed$hhincome ~ Sealed$square_footage)
.
When I run the code as indicated in Ranmath's example at the link below, everything works as expected. I'm new to R after many years with STATA, so any help would be greatly appreciated :)
R extract regression coefficients from multiply regression via lapply command
library(plyr)
Sealed <- read.csv("~/Desktop/SEALED.csv")
x <- function(df) {
lm(Sealed$hhincome ~ Sealed$square_footage)
}
regressions <- dlply(Sealed, .(Sealed$zipcode), x)
coefficients <- ldply(regressions, coef)
Upvotes: 3
Views: 688
Reputation: 226332
Because dlply
takes a ...
argument that allows additional arguments to be passed to the function, you can make things even simpler:
dlply(Sealed,.(zipcode),lm,formula=hhincome~square_footage)
The first two arguments to lm
are formula
and data
. Since formula
is specified here, lm
will pick up the next argument it is given (the relevant zipcode-specific chunk of Sealed
) as the data
argument ...
Upvotes: 3
Reputation: 55360
The issue is not with plyr
but rather in the definition of the function. You are calling a function, but not doing anything with the variable.
As an analogy,
myFun <- function(x) {
3 * 7
}
> myFun(2)
[1] 21
> myFun(578)
[1] 21
If you run this function on different values of x, it will still give you 21, no matter what x is. That is, there is no reference to x within the function. In my silly example, the correction is obvious; in your function above, the confusion is understandable. The $hhincome
and $square_footage
should conceivably serve as variables.
But you want your x to vary over what comes before the $
. As @Joran correctly pointed out, swap sealed$hhincome
with df$hhincome
(and same for $squ..
) and that will help.
Upvotes: 1
Reputation: 173577
You are applying the function:
x <- function(df) {
lm(Sealed$hhincome ~ Sealed$square_footage)
}
to each subset of your data, so we shouldn't be surprised that the output each time is exactly
lm(Sealed$hhincome ~ Sealed$square_footage)
right? Try replacing Sealed
with df
inside your function. That way you're referring to the variables in each individual piece passed to the function, not the whole variable in the data frame Sealed
.
Upvotes: 2