Eyk Rehbein
Eyk Rehbein

Reputation: 3868

Using different data types in EagerTensor

Using the Tensorflow 2.0 alpha, I received the error ValueError: Can't convert Python sequence with mixed types to Tensor, when I was trying to create a tf.data.Dataset using the following data:

enter image description here

Inspect the complete dataset on Kaggle

Obviously, there are mixed data types. Sex is a string, Age a float/double, SibSp and Parch are Integers and so on.

My (Python 3) code to transform this Pandas Dataframe into a tf.data.Dataset is based on Tensorflow's tutorial on How to classify structured data, and looks like the following:

def df_to_dataset(dataframe, shuffle=True, batch_size=32):
  dataframe = dataframe.copy()

  # the 'Survived' column is the label (not shown in the image of the Dataframe but exists in the Dataframe)
  label = dataframe.pop('Survived')

  # create the dataset from the dataframe
  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), label))

  # if shuffle == true, randomize the entries
  if shuffle:
    ds = ds.shuffle(buffer_size=len(dataframe))
  ds = ds.batch(batch_size)

  return ds

As already mentioned above, this function will throw the error ValueError: Can't convert Python sequence with mixed types to Tensor when executing it with, for instance:

train_ds = df_to_dataset(df_train, batch_size=32) 

(while df_train is the pandas dataframe you can see in the image)

Now I wonder if I am missing something because Tensorflow's tutorial (mentioned above) is using a dataframe with mixed types, as well, but I ran into no errors when trying this example with exactly the same df_to_dataset function.

Upvotes: 3

Views: 1764

Answers (1)

Sharky
Sharky

Reputation: 4533

This error is due to NaN values is specific columns. Detect them with dataframe['Name'].isnull().sum()) and replace.

Upvotes: 3

Related Questions