Reputation: 7435
I'm working on Fama-McBeth regressions and am trying to average portfolio groups. To do this, I've maximized the variance of portfolio groups and determined ABC, FHI, GED, are the appropriate groupings. Now I need to take the average return across time series and group them together. I'm using individual for
loops to accomplish this, but am looking for a more "R" way to do this either through dplyr
or lapply
or any other method appropriate, but am not sure how to accomplish this. I'm trying to get out of the for
loop mind set and would appreciate any help.
dput :
data <- structure(list(`NA` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
), Index = c(29.65, -11.91, 14.73, 27.68, 5.18, 25.97, 10.64,
1.02, 18.82, 23.92, -41.61, -6.64), Factor = c(-9.84, 6.46, 16.12,
-16.51, 17.82, -13.31, -3.52, 8.43, 8.23, 7.06, -15.74, 2.03),
A = c(33.88, -49.87, 65.14, 14.46, 15.67, -32.17, -31.55,
-23.79, -4.59, -8.03, 78.22, 4.75), B = c(-25.2, 24.7, -25.04,
-38.64, 61.93, 44.94, -74.65, 47.02, 28.69, 48.61, -85.02,
42.95), C = c(36.48, -25.11, 18.91, -23.31, 63.95, -19.56,
50.18, -42.28, -0.54, 23.65, -0.79, -48.6), D = c(42.89,
-54.39, -39.86, -0.72, -32.82, 69.42, 74.52, 28.61, 2.32,
26.26, -68.7, 26.27), E = c(-39.89, 44.92, -3.91, -3.21,
44.26, 90.43, 15.38, -17.64, 42.36, -3.65, -85.71, 13.24),
F = c(39.67, -54.33, -5.69, 92.39, -42.96, 76.72, 21.95,
28.83, 18.93, 23.31, -45.64, -34.34), G = c(74.57, -79.76,
26.73, -3.82, 101.67, 1.72, -43.95, 98.01, -2.45, 15.36,
2.27, -54.47), H = c(40.22, -71.58, 14.49, 13.74, 24.24,
77.22, -13.4, 28.12, 37.65, 80.59, -72.47, -1.5), I = c(90.19,
-26.64, 18.14, 0.09, 8.98, 72.38, 28.95, 39.41, 94.67, 52.51,
-80.26, -24.46)), .Names = c("NA", "Index", "Factor", "A",
"B", "C", "D", "E", "F", "G", "H", "I"), row.names = c(NA, -12L
), class = "data.frame")
For Loops :
# Get Group 1 (ABC) averages
vec <- NULL
ABC <- NULL
for (i in 1:12) {
avg <- (data[i,4] + data[i,5] + data[i,6])/3
vec <- append(vec, avg)
assign(paste(stocks[1], stocks[2], stocks[3], sep = ""), vec)
}
# Get Group 2 (FHI) averages
vec <- NULL
FHI <- NULL
for (i in 1:12) {
avg <- (data[i,9] + data[i,11] + data[i,12])/3
vec <- append(vec, avg)
assign(paste(stocks[6], stocks[8], stocks[9], sep = ""), vec)
}
# Get Group 1 (GED) averages
vec <- NULL
GED <- NULL
for (i in 1:12) {
avg <- mean(data[i,7] + data[i,8] + data[i,10])/3
vec <- append(vec, avg)
assign(paste(stocks[7], stocks[5], stocks[4], sep = ""), vec)
}
group.df <- data.frame(ABC = ABC, FHI = FHI, GED = GED, index = data$Index, factor = data$Factor) # Build data.frame with averages, index, and factor
Current (desired) output :
ABC FHI GED index factor
1 15.053333 56.693333 25.856667 29.65 -9.84
2 -16.760000 -50.850000 -29.743333 -11.91 6.46
3 19.670000 8.980000 -5.680000 14.73 16.12
4 -15.830000 35.406667 -2.583333 27.68 -16.51
5 47.183333 -3.246667 37.703333 5.18 17.82
6 -2.263333 75.440000 53.856667 25.97 -13.31
7 -18.673333 12.500000 15.316667 10.64 -3.52
8 -6.350000 32.120000 36.326667 1.02 8.43
9 7.853333 50.416667 14.076667 18.82 8.23
10 21.410000 52.136667 12.656667 23.92 7.06
11 -2.530000 -66.123333 -50.713333 -41.61 -15.74
12 -0.300000 -20.100000 -4.986667 -6.64 2.03
Upvotes: 0
Views: 83
Reputation: 145755
The simple answer is
library(dplyr)
data %>% mutate(ABC = (A + B + C) / 3,
FHI = (F + H + I) / 3,
GED = (G + E + D) / 3) %>%
select(ABC, FHI, GED, Index, Factor)
This does what you have in more or less the same way. (I omitted the mean
of a single value you have in the GED code.) It doesn't automatically paste together the names. If you want to do that, then I'd expect you to provide a grouping key, something like
grouping = data.frame(stock = c("A", "B", "C", "D", "E", "F", "G", "H", "I"),
group = c(1, 1, 1, 3, 3, 2, 3, 2, 2))
Then things could be done more automatically. Since your example manually hard-codes the column numbers in a seemingly arbitrary grouping, making the column names the only programmatic part doesn't add much.
To do things automatically, you would want to melt
your data into a long form, then join to the grouping df, then you could calculate averages after a group_by(group, NA)
clause, and widen your data again at the end with dcast
. (BTW, "NA"
is a terrible choice for a column name, I'd recommend "row.number" or "time" or anything that's not a special character.)
I like the dplyr above, but you're doing a lot of unnecessary stuff. Even in base R, you could get where you're going much more easily, something like
data$ABC = with(data, (A + B + C) / 3)
data$FHI = with(data, (F + H + I) / 3)
data$GED = with(data, (G + E + D) / 3)
data = data[, c("ABC", "FHI", "GED", "Index", "Factor")]
Upvotes: 1