Dipojjal Saha
Dipojjal Saha

Reputation: 95

Splitting train and test data by a particular variable

i am trying this code for splitting data into train and test for a logistic regression:

"""

from sklearn.model_selection import train_test_split

#Split the data into test and train
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3,
random_state=10)

"""

While splitting the train and test , i would like to split it by issue_dt which is a variable (date of issue of loan) but the variable should not be used for the logistic regression, Please any inputs on this

Upvotes: 0

Views: 532

Answers (2)

La Fonse
La Fonse

Reputation: 1

You can try install caTools package and use the sample.split() function.

However you would need to specify your Y and the ratio that you want to split:

train = sample.split( iris$Species, SplitRatio = 0.7)
trainset = subset( train, train == T)
testset = subset( train, train == F)

Upvotes: 0

Younng-Jin Kim
Younng-Jin Kim

Reputation: 13

Assume your X, Y are pandas dataframes.

Assume your 'issue_dt' is a column in X.

The following code

X_drop = X.drop(columns=['issue_dt'])
ind = X['issue_dt'] < a_specific_date # e.x., a_specific_date = X['issue_dt'].iloc[10]

X_train, X_test = X_drop[ind], X_drop[~ind]
Y_train, Y_test = Y[ind], Y[~ind]

might help you.

Upvotes: 1

Related Questions