user430953
user430953

Reputation: 355

What is the right way to import data to tensorflow?

I am new to Tensorflow and trying to make my own little project. I would like to import my CSV file as a dataset and then I would like to split it into training and testing sets and also to be able to make batches from my dataset.
My CSV file contains 3 columns of numbers so I managed to find these lines of code

filenames = ['mydata.csv']
record_defaults = [tf.float32] * 3
dataset = tf.contrib.data.CsvDataset(filenames, record_defaults, header=True, select_cols=[1,2,3])

How do I convert this object to tensor or dataset, so I can either split the data or create batches of data?

Upvotes: 0

Views: 206

Answers (2)

D_negn
D_negn

Reputation: 378

As explained on the tensorflow guide here , you have the dataset and after this you can preprocess your data using the Dataset.map() transformation for a certain defined function. Batching and shuffling could also be done after wards using dataset.batch(Batch_size) and dataset.shuffle(buffer_size=Buffer_Size). you can read the guide for further details.

Upvotes: 1

Matthieu Brucher
Matthieu Brucher

Reputation: 22023

Use a tool to split your data like sklearn.model_selection.train_test_split:

X_train, X_test, y_train, y_test = train_test_split(
    dataset[:2], dataset[2], test_size=0.33, random_state=42)

For instance if your dataset consists of two features columns and one output label.

Upvotes: 1

Related Questions