Reputation: 6668
I am new to R and feel a bit stupid asking this question.
Below is my code. Say if my data is a matrix of 100 rows and 3 columns. I want to split my data into train and test data using the first 80 rows as my training data & the last 20 as my testing data.
However running the code below, I can see x_test & y_test both have 100 rows, why?
data_dim <- dim(data_input)
split_row <- round(data_dim[1] * 0.8)
x_train <- data_input[1:split_row, 1 : data_dim[2]-1]
y_train <- data_input[1:split_row, data_dim[2]]
x_test <- data_input[split_row + 1 : data_dim[1], 1 : data_dim[2]-1]
y_test <- data_input[split_row + 1 : data_dim[1], data_dim[2]]
Upvotes: 1
Views: 168
Reputation: 3923
Simplest fix is to add some parens to make it clear what rows you want
set.seed(2020)
data_input <- matrix(runif(300), nrow = 100, ncol = 3)
data_dim <- dim(data_input)
split_row <- round(data_dim[1] * 0.8)
x_train <- data_input[1:split_row, 1 : data_dim[2]-1]
y_train <- data_input[1:split_row, data_dim[2]]
x_test <- data_input[(split_row + 1) : data_dim[1], 1 : data_dim[2]-1]
y_test <- data_input[(split_row + 1) : data_dim[1], data_dim[2]]
caret::createDataPartition
is a nice tool for this sort of thing.
Upvotes: 3
Reputation: 4151
I recommend looking into tidymodels
library(tidyverse)
library(rsample)
mtcars_split <- mtcars %>% initial_split(prop = .8)
train <- mtcars_split %>% training()
test <- mtcars_split %>% testing()
# Should not be needed on tidymodels
y_train <- train %>% select(mpg)
x_train <- train %>% select(-mpg)
y_test <- test %>% select(mpg)
x_test <- test %>% select(-mpg)
Upvotes: 1