Reputation: 3868
Using the Tensorflow 2.0 alpha, I received the error ValueError: Can't convert Python sequence with mixed types to Tensor
, when I was trying to create a tf.data.Dataset
using the following data:
Inspect the complete dataset on Kaggle
Obviously, there are mixed data types. Sex
is a string, Age
a float/double, SibSp
and Parch
are Integers and so on.
My (Python 3) code to transform this Pandas Dataframe into a tf.data.Dataset
is based on Tensorflow's tutorial on How to classify structured data, and looks like the following:
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
# the 'Survived' column is the label (not shown in the image of the Dataframe but exists in the Dataframe)
label = dataframe.pop('Survived')
# create the dataset from the dataframe
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), label))
# if shuffle == true, randomize the entries
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
As already mentioned above, this function will throw the error ValueError: Can't convert Python sequence with mixed types to Tensor
when executing it with, for instance:
train_ds = df_to_dataset(df_train, batch_size=32)
(while df_train
is the pandas dataframe you can see in the image)
Now I wonder if I am missing something because Tensorflow's tutorial (mentioned above) is using a dataframe with mixed types, as well, but I ran into no errors when trying this example with exactly the same df_to_dataset
function.
Upvotes: 3
Views: 1764
Reputation: 4533
This error is due to NaN values is specific columns.
Detect them with dataframe['Name'].isnull().sum())
and replace.
Upvotes: 3