Using CSV data as input to TensorFlow recommender

Question

I'm trying to replicate the quick start recommender with some csv data, and using the pandas read_csv operation.

Reading the csv data works and I can inspect it, e.g.

my_file_train = pd.read_csv("my_file.csv",header=0)

and I can view the .head() the data appears as expected. The type of the my_file_train is

Following the approach taken in How do I go from Pandas DataFrame to Tensorflow BatchDataset for NLP? I can get a DataSet from the Panda DataFrame

training_dataset = (
    tf.data.Dataset.from_tensor_slices(
        (
            tf.cast(my_file_train['feature1'].values, tf.string),
            tf.cast(my_file_train['user_id'].values, tf.int64)
        )
    )
)

The type of training_datasetis:

but so then I try to build vocabularies as in the example, where we see code like this:

user_ids_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(ratings.map(lambda x: x["user_id"]))

and I had thought that I could do something similar like this:

user_ids_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(training_dataset.map(lambda x: x[1]))

since the DataSet I have is tuples rather than a dictionary, but I get the following error:

TypeError: () takes 1 positional argument but 2 were given

which probably just exposes that I'm taking completely the wrong approach somewhere, but I'd be very grateful if anyone could set me on track.

Would it be simpler to create my own tfds dataset a la https://www.tensorflow.org/datasets/add_dataset rather than converting it on the fly? or is there some thing simple that I'm missing in terms of the manipulation that I'm trying to do?

Kaveh · Accepted Answer

I think this error is because the dataset map function passes 2 arguments to lambda, since it is a tuple of ‍(features,labels), and it raises to say "your map function takes 1 argument but 2 were given".

You may try this:

training_dataset.map(lambda x,y: y)

instead of training_dataset.map(lambda x: x[1])

Using CSV data as input to TensorFlow recommender

Answers (1)

Related Questions