huy
huy

Reputation: 306

Split data into testing and training and convert to csv or excel files

I have a large dataset (around 200k rows), i wanted to split the dataset into 2 parts randomly, 70% as the training data and 30% as the testing data. Is there a way to do this in python? Note I also want to get these datasets saved as excel or csv files in my computer. Thanks!

Upvotes: 1

Views: 11686

Answers (2)

My Koryto
My Koryto

Reputation: 667

Start by importing the following:

from sklearn.model_selection import train_test_split
import pandas as pd

In order to split you can use the train_test_split function from sklearn package:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

where X, y is your taken from your original dataframe.

Later, you can export each of them as CSV using the pandas package:

X_train.to_csv(index=False)
X_test.to_csv(index=False)

Same goes for y data as well.

EDIT: as you clarified the question and required both X and y factors on the same file, you can do the following:

train, test = train_test_split(yourdata, test_size=0.3, random_state=42)

and then export them to csv as I mentioned above.

Upvotes: 0

Rajat Agarwal
Rajat Agarwal

Reputation: 184

from sklearn.model_selection import train_test_split
#split the data into train and test set
train,test = train_test_split(data, test_size=0.30, random_state=0)
#save the data
train.to_csv('train.csv',index=False)
test.to_csv('test.csv',index=False)

Upvotes: 5

Related Questions