Reputation: 165
I am not sure my question makes sense. But, I am considering modifying an econometrics model using time series data. It is a multiple regression. One of the independent variables is the 5 year Treasury rate. This variable is split over two time periods. One variable is the 5 year Treasury rate from 1950 to 1986. After 1986 this variable takes the value of 0. The second one is 5 year Treasury rate from 1986 to the present. Before 1986, this second variable has values of 0. Someone suggested I replace the 0 values with blanks (equivalent to missing data). Because as suggested, those variables' meanings would be supposedly better specified. Could you do that with the subset() function. In other words, could you in effect remove or ignore the 0 values from those variables without actually removing or ignoring the entire row of data, and remove all the values from the other independent variables. I know this coding question is contingent on whether this process even makes sense. I am not sure it does. I have passed the theoretical question by Cross Validated. But, I am not sure I will get any answer. I figured I would go ahead and ask the coding question here.
Upvotes: 0
Views: 1687
Reputation: 13903
Assuming your data is in a data frame, the answer is "no." You cannot use subset
on only part of a data.frame
. That's because subset
on a data frame returns another data frame, and in a data frame all of the variables must be the same length.
There are plenty of ways to work around this restriction, but they won't work with lm
. Think about how regression works: every observation must be fully observed. If you have missing data, you have three options:
lm
(by way of the na.omit
function, buried inside the model.matrix
function, which is inside lm
)You should be able to get help in this area from Cross Validated. But the fact remains, there is simply no way to use lm
on variables of unequal length, and there is no way to get subset
to return a data frame containing variables of unequal length because all variables in a data frame must be the same length.
Upvotes: 2