sapbucket
sapbucket

Reputation: 7195

TensorFlow: Can data sets contain string category values?

With TensorFlow, it is easy to determine from examples that data contains numeric values. For example:

x_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]

However, does it also work with string category values? For example:

x_train = ["sunny", "rainy", "sunny", "cloudy"]
y_train = ["go outside", "stay inside", "go outside", "go outside"]

If it does not, I must assume that TensorFlow has a methodology for working with categorical values. Perhaps by some clever trick such as converting them to numeric values in some systematic way.

Upvotes: 1

Views: 266

Answers (1)

mrry
mrry

Reputation: 126154

Yes, TensorFlow does support datasets with categorical features. Perhaps the easiest way to work with them is to use the Feature Column API, which provides methods such as tf.feature_column.categorical_column_with_vocabulary_list() (for dealing with small, known sets of categories) and tf.feature_column.categorical_column_with_hash_bucket() (for dealing with large and potentially unbounded sets of categories).

Upvotes: 1

Related Questions