Reputation: 95
i am trying this code for splitting data into train and test for a logistic regression:
"""
from sklearn.model_selection import train_test_split
#Split the data into test and train
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3,
random_state=10)
"""
While splitting the train and test , i would like to split it by issue_dt which is a variable (date of issue of loan) but the variable should not be used for the logistic regression, Please any inputs on this
Upvotes: 0
Views: 532
Reputation: 1
You can try install caTools package and use the sample.split()
function.
However you would need to specify your Y and the ratio that you want to split:
train = sample.split( iris$Species, SplitRatio = 0.7)
trainset = subset( train, train == T)
testset = subset( train, train == F)
Upvotes: 0
Reputation: 13
Assume your X, Y are pandas dataframes.
Assume your 'issue_dt' is a column in X.
The following code
X_drop = X.drop(columns=['issue_dt'])
ind = X['issue_dt'] < a_specific_date # e.x., a_specific_date = X['issue_dt'].iloc[10]
X_train, X_test = X_drop[ind], X_drop[~ind]
Y_train, Y_test = Y[ind], Y[~ind]
might help you.
Upvotes: 1