Panel regression errors

Question

I am trying to do the panel regression, where dependent variable (stock returns for various companies) is regressed on 5 independent variables. Here is the reproductible example of a data frame of independent variables

dput(factors_1[1:10,])
structure(list(Date = 200002:200011, Mkt.RF = c(5.94, 0.66, -5.58, 
-0.09, 0.67, -1.58, -1.61, -4.99, -2.71, -4.55), SMB = c(0.84, 
-5.15, -4.62, 0.16, 0.33, -0.69, 0.68, 2.35, -6.1, -0.78), HML = c(-9.45, 
3.33, 5.93, 6.17, 3.14, 3.31, -0.5, 2.64, 7.54, 11.15), RMW = c(3.55, 
-2.59, -1.53, -3.38, -3.45, -0.12, -1.27, 1.63, 2.7, 0.79), CMA = c(-7.33, 
4.96, 1.32, 4.94, 1.22, -0.12, 0.64, 2.16, 4.1, 8.75), RF = c(0.43, 
0.47, 0.46, 0.5, 0.4, 0.48, 0.5, 0.51, 0.56, 0.51)), row.names = c(NA, 
10L), class = "data.frame")

and here for the stock returns

dput(xx[1:10, 1:10])
structure(list(Date = structure(c(10990.9954886386, 11019.9953776753, 
11050.9954014418, 11080.9952984982, 11111.9953776753, 11141.9951640545, 
11172.995061378, 11203.9951324494, 11233.9950455918, 11264.9949982497
), class = "Date"), X1 = c(0.0954887438827963, 
-0.0596463008222008, 0.071350885788402, 0.0305926490738153, 0.0408331711459304, 
-0.0211402933162625, -0.00493862203119688, 0.006182173191563, 
0.0032423131269943, 0.0193884936176278), X2 = c(-0.123462974283698, 
0.230503533400868, -0.0272942506612435, 0.0483790669291113, -0.0595278152717571, 
0.12087834022411, -0.032011380068422, -0.0813892896957779, 0.0138779835292666, 
0.0726322608057619), X3 = c(-0.0682052985267971, 0.172249290323711, 
-0.154888201350603, 0.0395159403332963, -0.0143942598523314, 
-0.0607566985291722, -0.0310708779173386, -0.0746345858888015, 
-0.151109426840925, 0.0118888362760825), X4 = c(-0.114511361380472, 
0.00998441685033158, 0.192522150537581, -0.0158023343537101, 
0.0374730915541921, 0.0777493327863055, -0.0016218724457906, 
-0.0635452365157563, 0.0565030106039299, 0.115759209185826), 
   X5 = c(0.00389199996406542, -0.0212889913893688, 
    0.164892967212694, -0.00832469019706505, -0.00462232472270219, 
    -0.0070177636719938, 0.00453659662769512, 0.0528941822866427, 
    -0.00836737746775751, -0.0050017502848112), X6 = c(-0.10351479457366, 
    0.0237125822002096, 0.0101860439504515, 0.0111924296807739, 
    -0.0652473862813747, 2.11404059631271e-05, 0.0261396151198399, 
    -0.0356789492292369, -0.0706069184275196, -0.0656535040135704
    ), X7 = c(-0.0980023956049211, 0.102552120231041, 
    -0.0959174074104425, -0.0790740833989735, 0.118610740782993, 
    -0.100050822390369, -0.00333557692764708, -0.0368703292701125, 
    0.0628135821343774, 0.0471186471744018), X9 = c(-0.0304322345046196, 
    -0.0977595796246631, 0.138258584646108, 0.0344876873979214, 
    -0.000721154371596811, 0.0508635363751093, 0.0533435865577603, 
    -0.0506646520146184, 0.0497235991059199, 0.0284083879640369
    ), X9 = c(-0.159712703662352, -0.0234902492510041, 0.116858931667507, 
    0.00432376896685471, 0.114340108193219, 0.00235829911414087, 
    -0.0573195744121493, 0.095634961997471, -0.0871461890063988, 
    -0.0738243041819919)), row.names = c(NA, 10L), class = "data.frame")

What I tried:

p1_q1_l<-plm(as.matrix(data.frame(xx[, -1]))~factors_1$Mkt.RF+factors_1$SMB+factors_1$HML+factors_1$RMW+factors_1$CMA,data=factors_1, method="within")

And what I got

Error in tapply(x, effect, func, ...) : arguments must have same length

I dont understand what is going on. Both tables are data frames with the same number of observations. How can I fix this?

jay.sf · Accepted Answer

It is very likely that the error arises from the fact that you define a matrix as your independent (Y) variable, where a vector is needed. You need the data in long format, where your Y is one column, and an ID and a time column denotes the different observations.

I have some doubts about the compatibility of your two data sets, though, but you may want to merge them into one. Just carefully look, how you may merge your original data, particularly regarding the Date columns.

When I understand your xxx data right, the X* are the different firms. Now convert the data in long format, using reshape.

xxx <- reshape(xx, timevar=1, varying=2:9, direction="long", sep="")
xxx$Date <- as.character(xx$Date[xxx$Date])

Then, it might be easier to merge the two data sets into one. The "Date" columns of both data frames, however, don' t match. When I understand your factors_1 data right, they are monthly values. I'll continue by simply attaching a "03" to the second to get them to match for now, but you know what's actually needed.

factors_1x <- transform(factors_1, 
                        Date=as.character(as.Date(strptime(paste(factors_1$Date, 03),
                                                           "%Y%m%d"))))

Here merge.

dat <- merge(xxx, factors_1x, all.x=TRUE)
head(dat)
#         Date           X id Mkt.RF  SMB   HML  RMW   CMA   RF
# 1 2000-02-03  0.09548874  1   5.94 0.84 -9.45 3.55 -7.33 0.43
# 2 2000-02-03 -0.05964630  2   5.94 0.84 -9.45 3.55 -7.33 0.43
# 3 2000-02-03  0.07135089  3   5.94 0.84 -9.45 3.55 -7.33 0.43
# 4 2000-02-03  0.03059265  4   5.94 0.84 -9.45 3.55 -7.33 0.43
# 5 2000-02-03  0.04083317  5   5.94 0.84 -9.45 3.55 -7.33 0.43
# 6 2000-02-03 -0.02114029  6   5.94 0.84 -9.45 3.55 -7.33 0.43

Now it is easier to write the formula. The new indices may be formulated in the plm call using index=c("id", "Date").

library(plm)
p1_q1_l <- plm(X ~ Mkt.RF + SMB + HML + RMW + CMA + RF, method="within",
               index=c("id", "Date"), data=dat)
# Model Formula: X ~ Mkt.RF + SMB + HML + RMW + CMA + RF
# 
# Coefficients:
#     Mkt.RF        SMB        HML        RMW        CMA         RF 
#  0.0042267  0.0054278 -0.0016806  0.0129446  0.0148160 -0.4194726

Panel regression errors

Answers (1)

Related Questions