Reputation: 806
I have three different .csv datasets that I typically read using pandas and train deep learning models with. Each data is a n by m matrix where n is the number of samples and m is the number of features. After reading the data, I do some reshaping and then feed them to my deep learning model using feed_dict
:
data1 = pd.DataFrame(np.random.uniform(low=0, high=1, size=(10,3)), columns=['A', 'B', 'C'])
data2 = pd.DataFrame(np.random.uniform(low=0, high=1, size=(10,3)), columns=['A', 'B', 'C'])
data3 = pd.DataFrame(np.random.uniform(low=0, high=1, size=(10,3)), columns=['A', 'B', 'C'])
data = pd.concat([data1, data2, data2], axis=1)
# Some deep learning model that work with data
# An optimizer
with tf.compat.v1.Session() as sess:
sess.run(init)
sess.run(optimizer, feed_dict={SOME VARIABLE: data})
However my data is too big to fit in memory now and I am wondering how can I use tf.data to read the data instead of using pandas. Sorry if the script I've provided is a pseudo-code and not my actual code.
Upvotes: 5
Views: 8620
Reputation: 1236
Applicable to TF2.0 and above. There are a few of ways to create a Dataset from CSV files:
I believe you are reading CSV files with pandas and then doing this
tf.data.Dataset.from_tensor_slices(dict(pandaDF))
You can also try this out
tf.data.experimental.make_csv_dataset
Or this
tf.io.decode_csv
Also this
tf.data.experimental.CsvDataset
Details are here: Load CSV
If you need to do processing prior to loading with Pandas then you can follow you current approach but instead doing a pd.concat([data1, data2, data2], axis=1)
, use the concatentate
function
data1 = pd.DataFrame(np.random.uniform(low=0, high=1, size=(10,3)), columns=['A', 'B', 'C'])
data2 = pd.DataFrame(np.random.uniform(low=0, high=1, size=(10,3)), columns=['A', 'B', 'C'])
data3 = pd.DataFrame(np.random.uniform(low=0, high=1, size=(10,3)), columns=['A', 'B', 'C'])
tf_dataset = tf.data.Dataset.from_tensor_slices(dict(data1))
tf_dataset = tf_dataset.concatentate(tf.data.Dataset.from_tensor_slices(dict(data2)))
tf_dataset = tf_dataset.concatentate(tf.data.Dataset.from_tensor_slices(dict(data3)))
More about concatenate
Upvotes: 5