Reputation: 17
I'm currently running a series of regressions on a data set consisting of 72,000 observations with roughly 20 variables for each observation. One of these variables consists of 56 names and i want to run a regression for each name. I imagine I would create a for loop for this however I am a little inexperienced working with data sets of this size.
The variable containing the name is not in the regression.
NAME : variable I want to run a for loop for to run regressions for each name.
My Code:
my.mods = lapply(s.dat, FUN = function(x) {
lm(log(TM+1000) ~ log(Inc+1) + log(Slip_sq_K+1) +
log(Ten+1) + log(HF+1) + log(BroP+1) + log(B+1) +
log(sian+1) + log(H_+1) + log(C_65+1) + log(D+1) +
log(TIP+1) + log(p+1) + log(X34itP+1) + log(FGK+1) +
log(X19tP+1) + log(X2nitP+1) + log(Car_AloneP+1) +
log(CaoledP+1) +log(PTsP+1) + log(Gy+1) +
log(OthemeansP+1) + log(HP+1) + log(Coi+1) +
log(electr) + log(Na+1),
data = x, na.action=na.exclude)
} )
Thanks!
Upvotes: 0
Views: 413
Reputation: 558
I prefer to work without loops if possible and I found this page really helpful. It shows how you can use a model within a plyr function and get a table with all the important parameters back from it.
Upvotes: 0
Reputation: 7190
Here is my solution in bare R, just a little bit long because I generated a code because I'am not sure if I get what you want. But I think if I get it you can just use the last line.
# Random code for example
set.seed(1)
names <- c("Homer", "Bart", "Lisa")
da <- rnorm(30)
da1 <- rnorm(30, 2)
data <- data.frame(Names = rep(names, 10), da, da1)
And here what I believe you can use:
reg <- by(data[, 2:3], data$Names, lm)
Here he output:
reg
data$Names: Bart
Call:
FUN(formula = data[x, , drop = FALSE])
Coefficients:
(Intercept) da1
0.7738 -0.3076
--------------------------------------------------------------
data$Names: Homer
Call:
FUN(formula = data[x, , drop = FALSE])
Coefficients:
(Intercept) da1
0.14672 -0.01079
--------------------------------------------------------------
data$Names: Lisa
Call:
FUN(formula = data[x, , drop = FALSE])
Coefficients:
(Intercept) da1
-1.3974 0.7396
Upvotes: 0
Reputation: 18595
If you want to use loops you could subset the data for each name:
data(mtcars)
models = list()
for (i in 1:length(unique(row.names(mtcars)))) {
sub_cars <- subset(x = mtcars, subset = row.names(mtcars) == row.names(mtcars)[i])
models[i] <- lm(mpg ~ cyl, data = sub_cars)
}
Upvotes: 0
Reputation: 145985
No need for loops. Just split your data and use lapply
.
acs.dat.split = split(acs.dat, acs.dat$NAME)
my.mods = lapply(acs.dat, FUN = function(x) {
lm(log(TSM+1000) ~ log(Inc+1) + log(Slip_sq_K+1) +
log(Teen+1) + log(HFGG+1) + log(BrownP+1) + log(BlackP+1) +
log(AsianP+1) + log(H_65+1) + log(C_65+1) + log(Detachedp+1) +
log(TIP+1) + log(X2Unitp+1) + log(X34UnitP+1) + log(FGK+1) +
log(X189tP+1) + log(X20PlusUnitP+1) + log(Car_AloneP+1) +
log(CarpooledP+1) +log(PublicTransP+1) + log(Gly+1) +
log(OthermeansP+1) + log(HomeP+1) + log(CommTime+1) +
log(electr) + log(Natural_gas+1),
data = x, na.action=na.exclude)
}
)
Upvotes: 0