Reputation: 355
I am new to Tensorflow and trying to make my own little project. I would like to import my CSV file as a dataset and then I would like to split it into training and testing sets and also to be able to make batches from my dataset.
My CSV file contains 3 columns of numbers so I managed to find these lines of code
filenames = ['mydata.csv']
record_defaults = [tf.float32] * 3
dataset = tf.contrib.data.CsvDataset(filenames, record_defaults, header=True, select_cols=[1,2,3])
How do I convert this object to tensor or dataset, so I can either split the data or create batches of data?
Upvotes: 0
Views: 206
Reputation: 378
As explained on the tensorflow guide here , you have the dataset and after this you can preprocess your data using the Dataset.map()
transformation for a certain defined function. Batching and shuffling could also be done after wards using dataset.batch(Batch_size)
and dataset.shuffle(buffer_size=Buffer_Size)
. you can read the guide for further details.
Upvotes: 1
Reputation: 22023
Use a tool to split your data like sklearn.model_selection.train_test_split
:
X_train, X_test, y_train, y_test = train_test_split(
dataset[:2], dataset[2], test_size=0.33, random_state=42)
For instance if your dataset consists of two features columns and one output label.
Upvotes: 1