Averaging groups over time series

Question

I'm working on Fama-McBeth regressions and am trying to average portfolio groups. To do this, I've maximized the variance of portfolio groups and determined ABC, FHI, GED, are the appropriate groupings. Now I need to take the average return across time series and group them together. I'm using individual for loops to accomplish this, but am looking for a more "R" way to do this either through dplyr or lapply or any other method appropriate, but am not sure how to accomplish this. I'm trying to get out of the for loop mind set and would appreciate any help.

dput :

    data <- structure(list(`NA` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
), Index = c(29.65, -11.91, 14.73, 27.68, 5.18, 25.97, 10.64, 
1.02, 18.82, 23.92, -41.61, -6.64), Factor = c(-9.84, 6.46, 16.12, 
-16.51, 17.82, -13.31, -3.52, 8.43, 8.23, 7.06, -15.74, 2.03), 
    A = c(33.88, -49.87, 65.14, 14.46, 15.67, -32.17, -31.55, 
    -23.79, -4.59, -8.03, 78.22, 4.75), B = c(-25.2, 24.7, -25.04, 
    -38.64, 61.93, 44.94, -74.65, 47.02, 28.69, 48.61, -85.02, 
    42.95), C = c(36.48, -25.11, 18.91, -23.31, 63.95, -19.56, 
    50.18, -42.28, -0.54, 23.65, -0.79, -48.6), D = c(42.89, 
    -54.39, -39.86, -0.72, -32.82, 69.42, 74.52, 28.61, 2.32, 
    26.26, -68.7, 26.27), E = c(-39.89, 44.92, -3.91, -3.21, 
    44.26, 90.43, 15.38, -17.64, 42.36, -3.65, -85.71, 13.24), 
    F = c(39.67, -54.33, -5.69, 92.39, -42.96, 76.72, 21.95, 
    28.83, 18.93, 23.31, -45.64, -34.34), G = c(74.57, -79.76, 
    26.73, -3.82, 101.67, 1.72, -43.95, 98.01, -2.45, 15.36, 
    2.27, -54.47), H = c(40.22, -71.58, 14.49, 13.74, 24.24, 
    77.22, -13.4, 28.12, 37.65, 80.59, -72.47, -1.5), I = c(90.19, 
    -26.64, 18.14, 0.09, 8.98, 72.38, 28.95, 39.41, 94.67, 52.51, 
    -80.26, -24.46)), .Names = c("NA", "Index", "Factor", "A", 
"B", "C", "D", "E", "F", "G", "H", "I"), row.names = c(NA, -12L
), class = "data.frame")

For Loops :

# Get Group 1 (ABC) averages
vec <- NULL
ABC <- NULL
for (i in 1:12) {
  avg <- (data[i,4] + data[i,5] + data[i,6])/3
  vec <- append(vec, avg)
  assign(paste(stocks[1], stocks[2], stocks[3], sep = ""), vec)
}

# Get Group 2 (FHI) averages
vec <- NULL
FHI <- NULL
for (i in 1:12) {
  avg <- (data[i,9] + data[i,11] + data[i,12])/3
  vec <- append(vec, avg)
  assign(paste(stocks[6], stocks[8], stocks[9], sep = ""), vec)
}

# Get Group 1 (GED) averages
vec <- NULL
GED <- NULL
for (i in 1:12) {
  avg <- mean(data[i,7] + data[i,8] + data[i,10])/3
  vec <- append(vec, avg)
  assign(paste(stocks[7], stocks[5], stocks[4], sep = ""), vec)
}

group.df <- data.frame(ABC = ABC, FHI = FHI, GED = GED, index = data$Index, factor = data$Factor)    # Build data.frame with averages, index, and factor

Current (desired) output :

          ABC        FHI        GED  index factor
1   15.053333  56.693333  25.856667  29.65  -9.84
2  -16.760000 -50.850000 -29.743333 -11.91   6.46
3   19.670000   8.980000  -5.680000  14.73  16.12
4  -15.830000  35.406667  -2.583333  27.68 -16.51
5   47.183333  -3.246667  37.703333   5.18  17.82
6   -2.263333  75.440000  53.856667  25.97 -13.31
7  -18.673333  12.500000  15.316667  10.64  -3.52
8   -6.350000  32.120000  36.326667   1.02   8.43
9    7.853333  50.416667  14.076667  18.82   8.23
10  21.410000  52.136667  12.656667  23.92   7.06
11  -2.530000 -66.123333 -50.713333 -41.61 -15.74
12  -0.300000 -20.100000  -4.986667  -6.64   2.03

Gregor Thomas · Accepted Answer

The simple answer is

library(dplyr)
data %>% mutate(ABC = (A + B + C) / 3,
                FHI = (F + H + I) / 3,
                GED = (G + E + D) / 3) %>%
    select(ABC, FHI, GED, Index, Factor)

This does what you have in more or less the same way. (I omitted the mean of a single value you have in the GED code.) It doesn't automatically paste together the names. If you want to do that, then I'd expect you to provide a grouping key, something like

grouping = data.frame(stock = c("A", "B", "C", "D", "E", "F", "G", "H", "I"),
                      group = c(1, 1, 1, 3, 3, 2, 3, 2, 2))

Then things could be done more automatically. Since your example manually hard-codes the column numbers in a seemingly arbitrary grouping, making the column names the only programmatic part doesn't add much.

To do things automatically, you would want to melt your data into a long form, then join to the grouping df, then you could calculate averages after a group_by(group, NA) clause, and widen your data again at the end with dcast. (BTW, "NA" is a terrible choice for a column name, I'd recommend "row.number" or "time" or anything that's not a special character.)

Cleaning up base R

I like the dplyr above, but you're doing a lot of unnecessary stuff. Even in base R, you could get where you're going much more easily, something like

data$ABC = with(data, (A + B + C) / 3)
data$FHI = with(data, (F + H + I) / 3)
data$GED = with(data, (G + E + D) / 3)
data = data[, c("ABC", "FHI", "GED", "Index", "Factor")]

Averaging groups over time series

Answers (1)

Cleaning up base R

Related Questions