Reputation: 109
I want to use Tensorflow Dataset api to initialize my dataset using tensorflow Hub. I want to use dataset.map function to convert my text data into embedding. My Tensorflow version is 1.14.
Since I used elmo v2 modlule which converts bunch of sentences array into their word embeddings, I used the following code:
import tensorflow as tf
import tensorflow_hub as hub
...
sentences_array = load_sentences()
#Sentence_array=["I love Python", "python is a good PL"]
def parse(sentences):
elmo = hub.Module("./ELMO")
embeddings = elmo([sentences], signature="default", as_dict=True)
["word_emb"]
return embeddings
dataset = tf.data.TextLineDataset(sentences_array)
dataset = dataset.apply(tf.data.experimental.map_and_batch(map_func =
parse, batch_size=batch_size))
I want embedding of text array like [batch_size, max_words_in_batch, embedding_size], but I got an error message as:
"NotImplementedError: Using TF-Hub module within a TensorFlow defined
function is currently not supported."
How can I get the expected results?
Upvotes: 3
Views: 681
Reputation: 14485
Unfortunately this is not supported in TensorFlow 1.x
It is, however, supported in TensorFlow 2.0 so if you can upgrade to tensorflow 2 and choose from the available text embedding modules for tf 2 (current list here) then you can use this in your dataset
pipeline. Something like this:
embedder = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1")
def parse(sentences):
embeddings = embedder([sentences])
return embeddings
dataset = tf.data.TextLineDataset("text.txt")
dataset = dataset.map(parse)
If you are tied to 1.x or tied to Elmo (which I don't think is yet available in the new format) then the only option I can see for embedding in the preprocessing stage is to first run your dataset through a simple embedding model and save the results then use the embedded vectors for the downstream task separately. (I appreciate this is less than ideal).
Upvotes: 2