Reputation: 25
I'm having a dataframe that consists a column that contains lm formulas. When I run this column for a specific row [[2]], I get my summary output of that LM. That works perfectly, but since I have 959 rows in that column, I want to write a for loop in order to do an anova on these regressions. How do I specify that I want to address all the objects in that list in a for loop?
In order for you to have a good understanding, here a MWE:
Dataframe:
structure(list(Week = 7:17, Category = c("2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2"), Brand = c("3", "3", "3",
"3", "3", "3", "3", "3", "3", "3", "3"), Display = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), Sales = c(0, 0, 0, 0, 13.440948, 40.097397,
32.01384, 382.169189, 2830.748779, 4524.460938, 1053.590576),
Price = c(0, 0, 0, 0, 5.949999, 5.95, 5.950003, 4.87759,
3.787015, 3.205987, 4.898724), Distribution = c(0, 0, 0,
0, 1.394019, 1.386989, 1.621416, 8.209759, 8.552915, 9.692097,
9.445554), Advertising = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0), lnSales = c(11.4945151554497, 11.633214247508, 11.5862944141137,
11.5412559646132, 11.4811122484454, 11.4775106999991, 11.6333660772506,
11.4859819773102, 11.5232680456161, 11.5572670584292, 11.5303686934256
), IntrayearCycles = c(4.15446534315765, 3.62757053512638,
2.92387946552647, 2.14946414386239, 1.40455011205262, 0.768856938870769,
0.291497141953598, -0.0131078404184544, -0.162984144025091,
-0.200882782749248, -0.182877633924882), `Competitor Advertising` = c(10584.87063,
224846.3243, 90657.72553, 0, 0, 0, 2396.54212, 0, 0, 0, 40343.49444
), `Competitor Display` = c(0.385629, 2.108133, 2.515806,
4.918288, 3.81749, 3.035847, 2.463194, 3.242594, 1.850399,
1.751096, 1.337943), `Competitor Prices` = c(5.30989, 5.372752,
5.3717245, 5.3295525, 5.298393, 5.319466, 5.1958415, 5.2941095,
5.296757, 5.294059, 5.273578), ZeroSales = c(1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0)), .Names = c("Week", "Category", "Brand",
"Display", "Sales", "Price", "Distribution", "Advertising", "lnSales",
"IntrayearCycles", "Competitor Advertising", "Competitor Display",
"Competitor Prices", "ZeroSales"), row.names = 1255:1265, class = "data.frame")
Then I apply a for loop to estimate an Error Correction Model (with ECM package) - this produces a Linear Model ouptut -. This for loop is applied to estimate 959 separate regressions.
f <- function(.) {
xeq <- as.data.frame(select(., lnPrice, lnAdvertising, lnDisplay, IntrayearCycles, lnCompetitorPrices, lnCompADV, lnCompDISP, ADVxDISP, ADVxCYC, DISPxCYC, ADVxDISPxCYC))
xtr <- as.data.frame(select(., lnPrice, lnAdvertising, lnDisplay, IntrayearCycles, lnCompetitorPrices, lnCompADV, lnCompDISP, ADVxDISP, ADVxCYC, DISPxCYC, ADVxDISPxCYC))
print(xeq)
print(xtr)
summary(ecm(.$lnSales, xeq, xtr, includeIntercept = TRUE))
}
Models <- DatasetThesisSynergyClean %>%
group_by(Category, Brand) %>%
do(Model = f(.))
To see the summary of a specific model (here model 2), you can address:
Models$model[[2]]
Consequently, I want to extract specific values from this summary output. But first I want to extract the Residuals Sum of Squares (RSS) to do an anova. I do this for one list object as follows:
anova_output_Unitmodels <- anova(Models$Model[[2]])
RSS_Unit <- anova_output_Unitmodels$`Sum Sq`[nrow(anova_output_Unitmodels)] #saving the RSS
Now, I want to for loop this accross all the list objects, from object [[1]] until [[959]]. This RSS output has to be saved end eventually I need to sum all these RSS values.
Furthermore, if this works, I need to extract all coefficients, t-values, and p-values of all variables, from all models. Then I also need to address the specific objects in the list and put $coefficients behind it, but I was not able to manage this too.
Here is how I implemented @Roman Lustrik's answer.
extractRSS <- function(x) {
an <- anova(x)
RSS_Unit <- an$`Sum Sq`[nrow(an)]
return(RSS_Unit)
}
sapply(Model, FUN = extractRSS)
I also tried to do it for one specific model, but this gives me an error:
SapplyRSS <- sapply(Models$Model, FUN = extractRSS)
I've had another idea and thought to for loop it differently, did not work out well but it's a start:
If you do
RSS2<- sum(Models$Model[[2]]$residuals^2)
So I thought replicate this in a for loop:
for(i in residuals.lm){
AllRSS<- as.matrix(c(1:949))
AllRSS <- as.data.frame(AllRSS)
SumRSS <- sum(Models$Model[[i]]$residuals^2)
SumRSS <- as.data.frame(SumRSS)
TotalRSS <- cbind(SumRSS, AllRSS)}
TotalRSS <- SumRSS[NULL,]
It starts with specifying the i in the for function, I do not know if this is right. Eventually it leaves me with an empty dataframe, or a dataframe with the value of the same brand.
Upvotes: 0
Views: 118
Reputation: 25
A different way of doing this is by exporting all the list objects as objects in the dataframe. You do this through:
names(Models$Model) <- paste0("C", Models$Category, "B", Models$Brand)
list2env(Models$Model, .GlobalEnv)
Then I wrote a for loop to address these objects, and to fill an empty dataframe over and over with the values from this for loop. This goes as follows:
for(X in c("0","1","3")){
EmptyRSS <- data.frame(RSS = 0)
ModelX <- get(paste0("C", X, "B2"))
RSS <- sum(ModelX$residuals^2)
RSS <- as.data.frame(RSS)
DF <- ModelX$df[2]
DF <- as.data.frame(DF)
RSSDF <- cbind(RSS, DF)
TotalRSS2 <- rbind(TotalRSS2, RSSDF)
}
TotalRSS2 <- RSSDF[NULL,]
You should run the command outside the loop twice.
Upvotes: 0
Reputation: 70653
@MichaelChirico probably had something like this in mind.
extractRSS <- function(x) {
an <- anova(x)
RSS_Unit <- an$`Sum Sq`[nrow(an)]
return(RSS_Unit)
}
sapply(Model, FUN = extractRSS)
sapply
will traverse every Models$Model[[i]]
object and extract RSS. You can modify this function to perhaps include other pieces of information. The result will probably be coerced to some simpler object. You can prevent this by sapply(..., simplify = FALSE)
.
Upvotes: 1