mHelpMe
mHelpMe

Reputation: 6668

selecting a subset of a matrix

I am new to R and feel a bit stupid asking this question.

Below is my code. Say if my data is a matrix of 100 rows and 3 columns. I want to split my data into train and test data using the first 80 rows as my training data & the last 20 as my testing data.

However running the code below, I can see x_test & y_test both have 100 rows, why?

data_dim <- dim(data_input)
split_row <- round(data_dim[1] * 0.8)

x_train <- data_input[1:split_row, 1 : data_dim[2]-1]
y_train <- data_input[1:split_row, data_dim[2]]
x_test <- data_input[split_row + 1 : data_dim[1], 1 : data_dim[2]-1]
y_test <- data_input[split_row + 1 : data_dim[1], data_dim[2]]

Upvotes: 1

Views: 168

Answers (2)

Chuck P
Chuck P

Reputation: 3923

Simplest fix is to add some parens to make it clear what rows you want

set.seed(2020)
data_input <- matrix(runif(300), nrow = 100, ncol = 3)
data_dim <- dim(data_input)
split_row <- round(data_dim[1] * 0.8)

x_train <- data_input[1:split_row, 1 : data_dim[2]-1]
y_train <- data_input[1:split_row, data_dim[2]]
x_test <- data_input[(split_row + 1) : data_dim[1], 1 : data_dim[2]-1]
y_test <- data_input[(split_row + 1) : data_dim[1], data_dim[2]]

caret::createDataPartition is a nice tool for this sort of thing.

Upvotes: 3

Bruno
Bruno

Reputation: 4151

I recommend looking into tidymodels

library(tidyverse)
library(rsample)

mtcars_split <- mtcars %>% initial_split(prop = .8)
train <- mtcars_split %>% training()
test <- mtcars_split %>% testing()

# Should not be needed on tidymodels

y_train <- train %>% select(mpg)
x_train <- train %>% select(-mpg)

y_test <- test %>% select(mpg)
x_test <- test %>% select(-mpg)

Upvotes: 1

Related Questions