Reputation: 73
I am trying to calculate the fama french factors in r. after several days of sweat and despair I managed to calculate the returns of the 6 respective portfolios...only to find a problem i just dont seem to be able to solve.
my data approximately looks like this, this is just a simplified data set to illustrate my problem:
> TestX = data.table(Group = c("SM", "SM", "SM", "SH", "SH", "SH", "SL", "SL", "SL"), Date= as.Date(c("1995-07-30","1995-07-30","1995-07-30","1995-07-30","1995-07-30","1995-07-30","1995-07-30","1995-07-30", "1995-07-30")), Code= c("C1", "C2", "C3", "C4", "C5", "C6", "C7", "C8", "C9"), SMRet = c(2,3,3, NA, NA, NA, NA, NA, NA), SHRet = c(NA, NA, NA, 5,5,5, NA, NA, NA), SLRet = c(NA, NA, NA, NA, NA, NA, 0,1,2) )
> TestX
Group Date Code SMRet SHRet SLRet
1: SM 1995-07-30 C1 2 NA NA
2: SM 1995-07-30 C2 3 NA NA
3: SM 1995-07-30 C3 3 NA NA
4: SH 1995-07-30 C4 NA 5 NA
5: SH 1995-07-30 C5 NA 5 NA
6: SH 1995-07-30 C6 NA 5 NA
7: SL 1995-07-30 C7 NA NA 0
8: SL 1995-07-30 C8 NA NA 1
9: SL 1995-07-30 C9 NA NA 2
Group gives the group (SmallMedium, SmallHigh, SmallLow, I have the other groups in the real data.table). Code gives the respective company codes etc.
what I want to do is create a new column with the respective factors.
for that I need to do the following calculation:
(Smret+SHret+SLret)/3
, but how can I do that?
TestX[, Factor := (SMRet+SHRet+SLRet)/3, by = Date]
didnt work, I only got NAs everywhere.
Group Date Code SMRet SHRet SLRet Factor
1: SM 1995-07-30 C1 2 NA NA NA
2: SM 1995-07-30 C2 3 NA NA NA
3: SM 1995-07-30 C3 3 NA NA NA
4: SH 1995-07-30 C4 NA 5 NA NA
5: SH 1995-07-30 C5 NA 5 NA NA
6: SH 1995-07-30 C6 NA 5 NA NA
7: SL 1995-07-30 C7 NA NA 0 NA
8: SL 1995-07-30 C8 NA NA 1 NA
9: SL 1995-07-30 C9 NA NA 2 NA
I also need to group by date. the real data.table has 402 other months.
Thanks in advance.
EDIT: here is a better data.table to illustrate my problem
TestX = data.table(Group = c("SM", "SM", "SH", "SH", "SL", "SL", "SM", "SM", "SH", "SH", "SL", "SL"), Date= as.Date(c("1995-07-30","1995-07-30","1995-07-30","1995-07-30","1995-07-30","1995-07-30","1995-08-30","1995-08-30", "1995-08-30", "1995-08-30","1995-08-30","1995-08-30")), Code= c("C1", "C2", "C3", "C4", "C5", "C6", "C7", "C8", "C9", "c10", "c11", "12"), SMRet = c(2,3, NA, NA, NA, NA, 4, 5, NA, NA, NA, NA), SHRet = c(NA, NA, 5, 5, NA, NA, NA, NA, 3, 4, NA, NA), SLRet = c(NA, NA, NA, NA, 0, 1, NA,NA,NA, NA, 2,3))
> TestX
Group Date Code SMRet SHRet SLRet
1: SM 1995-07-30 C1 2 NA NA
2: SM 1995-07-30 C2 3 NA NA
3: SH 1995-07-30 C3 NA 5 NA
4: SH 1995-07-30 C4 NA 5 NA
5: SL 1995-07-30 C5 NA NA 0
6: SL 1995-07-30 C6 NA NA 1
7: SM 1995-08-30 C7 4 NA NA
8: SM 1995-08-30 C8 5 NA NA
9: SH 1995-08-30 C9 NA 3 NA
10: SH 1995-08-30 c10 NA 4 NA
11: SL 1995-08-30 c11 NA NA 2
12: SL 1995-08-30 12 NA NA 3
and this is the desired result:
Group Date Code SMRet SHRet SLRet Factor
1: SM 1995-07-30 C1 2 NA NA 5.333333
2: SM 1995-07-30 C2 3 NA NA 5.333333
3: SH 1995-07-30 C3 NA 5 NA 5.333333
4: SH 1995-07-30 C4 NA 5 NA 5.333333
5: SL 1995-07-30 C5 NA NA 0 5.333333
6: SL 1995-07-30 C6 NA NA 1 5.333333
7: SM 1995-08-30 C7 4 NA NA 7.000000
8: SM 1995-08-30 C8 5 NA NA 7.000000
9: SH 1995-08-30 C9 NA 3 NA 7.000000
10: SH 1995-08-30 c10 NA 4 NA 7.000000
11: SL 1995-08-30 c11 NA NA 2 7.000000
12: SL 1995-08-30 12 NA NA 3 7.000000
so: for each month:(SMRet+ShRet+SLRet)/3
Upvotes: 1
Views: 361
Reputation: 140
I think the tidyverse
is a convenient package for this task, yet it may be not as fast as data.table
. you can easily calculate by group with group_by()
:
library(tidyverse)
TestX %>%
gather("GP", "value", -Group,-Code, -Date) %>%
filter(!is.na(value)) %>% # gather the 3 columns into 1, and then compute
arrange(Date,Group, Code) %>%
group_by(Date) %>% # group before compute the results, you can also group by Code
mutate(Factor = sum(value)/3)
# A tibble: 12 x 6
# Groups: Date [2]
Group Date Code GP value Factor
<chr> <date> <chr> <chr> <dbl> <dbl>
1 SH 1995-07-30 C3 SHRet 5 5.33
2 SH 1995-07-30 C4 SHRet 5 5.33
3 SL 1995-07-30 C5 SLRet 0 5.33
4 SL 1995-07-30 C6 SLRet 1 5.33
5 SM 1995-07-30 C1 SMRet 2 5.33
6 SM 1995-07-30 C2 SMRet 3 5.33
7 SH 1995-08-30 c10 SHRet 4 7
8 SH 1995-08-30 C9 SHRet 3 7
9 SL 1995-08-30 12 SLRet 3 7
10 SL 1995-08-30 c11 SLRet 2 7
11 SM 1995-08-30 C7 SMRet 4 7
12 SM 1995-08-30 C8 SMRet 5 7
Upvotes: 1
Reputation: 6073
You can use the following code to calculate the fama french factors in R:
TestX[ , newvar := sum(SMRet, SHRet, SLRet, na.rm=TRUE)/3, by=Date]
Upvotes: 2