python_person
python_person

Reputation: 15

"Arguments imply differing number of rows" Error after splliting data set into test and training sets

The following code is me splitting the 'Weekly' data set into training and testing datasets. My training data set is supposed to contain years 1990-2008 while my testing data set spans 2009-2010. The Weekly data set is a dataset in R.

weekly.train = split(Weekly, Weekly$Year == 1990:2008)
weekly.test = split(Weekly, Weekly$Year == 2009:2010)

When I do a logistic regression model to the training set I get this error:

"Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1037, 52"

Here's my code for the regression:

mod.fit.lr<-glm(Direction ~ Lag1+Lag2+Lag3+Lag4+Lag5+Volume, data = weekly.train,family = binomial)

Upvotes: 0

Views: 176

Answers (1)

slava-kohut
slava-kohut

Reputation: 4233

split returns a list of two groups (TRUE and FALSE), while you would want to have only one group (the target set). You can either extract the TRUE element or use indices explicitly:

i_test <- Weekly$Year %in% 2009:2019 

weekly.test <- Weekly[i_test, ]
weekly.train <- Weekly[!i_test, ]

Upvotes: 0

Related Questions