Reputation: 7922
I am following this tutorial: Classify structured data with feature columns. I made it work for the original data, now I am implementing it using my own data. I encounter a problem though, which I think can be traced back to the datatype appearing last in a tensorflow.python.data.ops.dataset_ops.BatchDataset
object. What is object this datatype referring to?
map()
: that's not going to help here.tf.data.Dataset
documentation: I see a good chance that the answer is here, I haven't found it though.# necessary imports
import tensorflow as tf
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
Following whats in the tutorial, let's create the dataframe (data openly available just by running the following code):
dataset_url = 'http://storage.googleapis.com/download.tensorflow.org/data/petfinder-mini.zip'
csv_file = 'datasets/petfinder-mini/petfinder-mini.csv'
tf.keras.utils.get_file('petfinder_mini.zip', dataset_url,
extract=True, cache_dir='.')
dataframe = pd.read_csv(csv_file)
The first 3 rows, for illustration:
+----+--------+-------+----------------------+----------+----------+----------+----------------+-------------+--------------+--------------+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+-----------------+
| | Type | Age | Breed1 | Gender | Color1 | Color2 | MaturitySize | FurLength | Vaccinated | Sterilized | Health | Fee | Description | PhotoAmt | AdoptionSpeed |
|----+--------+-------+----------------------+----------+----------+----------+----------------+-------------+--------------+--------------+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+-----------------|
| 0 | Cat | 3 | Tabby | Male | Black | White | Small | Short | No | No | Healthy | 100 | Nibble is a 3+ month old ball of cuteness. He is energetic and playful. I rescued a couple of cats a few months ago but could not get them neutered in time as the clinic was fully scheduled. The result was this little kitty. I do not have enough space and funds to care for more cats in my household. Looking for responsible people to take over Nibble's care. | 1 | 2 |
| 1 | Cat | 1 | Domestic Medium Hair | Male | Black | Brown | Medium | Medium | Not Sure | Not Sure | Healthy | 0 | I just found it alone yesterday near my apartment. It was shaking so I had to bring it home to provide temporary care. | 2 | 0 |
| 2 | Dog | 1 | Mixed Breed | Male | Brown | White | Medium | Medium | Yes | No | Healthy | 0 | Their pregnant mother was dumped by her irresponsible owner at the roadside near some shops in Subang Jaya. Gave birth to them at the roadside. They are all healthy and adorable puppies. Already dewormed, vaccinated and ready to go to a home. No tying or caging for long hours as guard dogs. However, it is acceptable to cage or tie for precautionary purposes. Interested to adopt pls call me. | 7 | 3 |
+----+--------+-------+----------------------+----------+----------+----------+----------------+-------------+--------------+--------------+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+-----------------+
(Printed using this.)
Create additional column, drop unneccessary columns, as in tutorial:
dataframe['target'] = np.where(dataframe['AdoptionSpeed']==4, 0, 1)
dataframe = dataframe.drop(columns=['AdoptionSpeed', 'Description'])
Define function which turns our data to tensorflow datasets:
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
labels = dataframe.pop('target')
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
Split data:
train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
Use our function:
ds = df_to_dataset(train)
The type
of ds
(ie type(ds)
) is:
tensorflow.python.data.ops.dataset_ops.BatchDataset
Let's print it as well:
print(ds)
I get (inserted linebreaks for readability):
<BatchDataset shapes:
({Type: (None,),
Age: (None,),
Breed1: (None,),
Gender: (None,),
Color1: (None,),
Color2: (None,),
MaturitySize: (None,),
FurLength: (None,),
Vaccinated: (None,),
Sterilized: (None,),
Health: (None,),
Fee: (None,),
PhotoAmt: (None,)},
(None,)),
types:
({Type: tf.string,
Age: tf.int64,
Breed1: tf.string,
Gender: tf.string,
Color1: tf.string,
Color2: tf.string,
MaturitySize: tf.string,
FurLength: tf.string,
Vaccinated: tf.string,
Sterilized: tf.string,
Health: tf.string,
Fee: tf.int64,
PhotoAmt: tf.int64},
tf.int64)>
What object exactly is the final tf.int64
referring to?
When I use my own dataset, the last element of the ds
, produced in the same way is tf.string
, which I believe the cause of my later problems.
Upvotes: 1
Views: 207
Reputation:
It is the tf.TypeSpec object returned by tf.data API, which is referring to ds in your case.
You can also inspect the type of each individual component using Dataset.element_spec
.
Below is the example of the same data which you have used above.
print(ds.element_spec)
Result is:
({'Age': TensorSpec(shape=(None,), dtype=tf.int64, name=None),
'Breed1': TensorSpec(shape=(None,), dtype=tf.string, name=None),
'Color1': TensorSpec(shape=(None,), dtype=tf.string, name=None),
'Color2': TensorSpec(shape=(None,), dtype=tf.string, name=None),
'Fee': TensorSpec(shape=(None,), dtype=tf.int64, name=None),
'FurLength': TensorSpec(shape=(None,), dtype=tf.string, name=None),
'Gender': TensorSpec(shape=(None,), dtype=tf.string, name=None),
'Health': TensorSpec(shape=(None,), dtype=tf.string, name=None),
'MaturitySize': TensorSpec(shape=(None,), dtype=tf.string, name=None),
'PhotoAmt': TensorSpec(shape=(None,), dtype=tf.int64, name=None),
'Sterilized': TensorSpec(shape=(None,), dtype=tf.string, name=None),
'Type': TensorSpec(shape=(None,), dtype=tf.string, name=None),
'Vaccinated': TensorSpec(shape=(None,), dtype=tf.string, name=None)},
TensorSpec(shape=(None,), dtype=tf.int64, name=None))
Everything inside the dictionary is referring to the individual column TypeSpec and the last one is referring to your ds tf.TypeSpec which is TensorSpec in all the cases here.
You can find more details about DataStructure from Tensorflow's guide here.
Upvotes: 1