Reputation: 175
Currently I am working on a system that can take data from a CSV file and import it into a TFRecord file, However I have a few questions.
For starters, I need to know what type a TFRecord file can take, when using CSV types are removed.
Secondly, How can I convert data type:object into a type that a TFRecord can take?
I have two columns (will post example below) of two objects types that are strings, How can I convert that data to the correct type for TFRecords?
When importing Im hoping to append data from each row at a time into the TFRecord file, any advice or documentation would be great, I have been looking for some time at this problem and it seems there can only be ints,floats inputted into a TFRecord but what about a list/array of Integers?
Thankyou for reading!
Quick Note, I am using PANDAS to create a dataframe of the CSV file
Some Example Code Im using
import pandas as pd
from ast import literal_eval
import numpy as np
import tensorflow as tf
tf.compat.v1.enable_eager_execution()
def Start():
db = pd.read_csv("I:\Github\ClubKeno\Keno Project\Database\..\LotteryDatabase.csv")
pd.DataFrame = db
print(db['Winning_Numbers'])
print(db.dtypes)
training_dataset = (
tf.data.Dataset.from_tensor_slices(
(
tf.cast(db['Draw_Number'].values, tf.int64),
tf.cast(db['Winning_Numbers'].values, tf.int64),
tf.cast(db['Extra_Numbers'].values, tf.int64),
tf.cast(db['Kicker'].values, tf.int64)
)
)
)
for features_tensor, target_tensor in training_dataset:
print(f'features:{features_tensor} target:{target_tensor}')
Error Message:
Update: Got Two Columns of dating working using the following function...
dataset = tf.data.experimental.make_csv_dataset(
file_pattern=databasefile,
column_names=['Draw_Number', 'Kicker'],
column_defaults=[tf.int64, tf.int64],
)
However when trying to include my two other column object types
(What data looks like in both those columns)
"3,9,11,16,25,26,28,29,36,40,41,46,63,66,67,69,72,73,78,80"
I get an error, here is the function I tried for that
dataset = tf.data.experimental.make_csv_dataset(
file_pattern=databasefile,
column_names=['Draw_Number', 'Winning_Numbers', 'Extra_Numbers', 'Kicker'],
column_defaults=[tf.int64, tf.compat.as_bytes, tf.compat.as_bytes, tf.int64],
header=True,
batch_size=100,
field_delim=',',
na_value='NA'
)
This Error Appears:
TypeError: Failed to convert object of type <class 'function'> to Tensor. Contents: <function as_bytes at 0x000000EA530908C8>. Consider casting elements to a supported type.
Should I try to Cast those two types outside the function and try combining it later into the TFRecord file alongside the tf.data from the make_csv_dataset
function?
Upvotes: 1
Views: 593
Reputation: 2507
For starters, I need to know what type a TFRecord file can take, when using CSV types are removed.
TFRecord accepts following datatypes- string, byte, float32, float 64, bool, enum, int32, int64, uint32, uint64 Talked here.
Secondly, How can I convert data type:object into a type that a TFRecord can take?
Here is an example from TF, it is a bit complicated to digest it at once but if you read it carefully it is easy.
have two columns (will post example below) of two objects types that are strings, How can I convert that data to the correct type for TFRecords?
For string type data, you require tf.train.BytesList
which returns a bytes_list from a string.
When importing Im hoping to append data from each row at a time into the TFRecord file, any advice or documentation would be great, I have been looking for some time at this problem and it seems there can only be ints,floats inputted into a TFRecord but what about a list/array of Integers?
Quick Note, I am using PANDAS to create a dataframe of the CSV file
Instead of reading csv file using Pandas, I would recommend you to use tf.data.experimental.make_csv_dataset
defined here. This will make this conversion process very faster than Pandas and will give you less compatibility issues to work with TF classes. If you use this function, then you will not need to read the csv file row by row but all at once using map()
which uses eager execution
. This is a good tutorial to get started.
Accidentally edited wrong section of the post
Upvotes: 1