Reputation: 2670
Say I have an embedding tensor:
emb = [[1,1],
[2,2],
[3,3],
[4,4]]
emb = tf.constant(emb)
I have a list of sequences:
inputs = [[0,1,2,3]
[3,2]]
I'd like to lookup the emb
and pad zeros to make each sequence has the same length:
[[[1, 1],
[2, 2],
[3, 3],
[4, 4]],
[[4, 4],
[3, 3],
[0, 0],
[0, 0]]]
I tried tf.nn.embedding_lookup
, but got an error:
ValueError: Argument must be a dense tensor: [[0, 1, 2, 3], [3, 2]] - got shape [2], but wanted [2, 4].
Is it possible to achieve this without prepending [0, 0]
to emb
?
Upvotes: 2
Views: 2068
Reputation: 126154
The tf.nn.embedding_lookup(params, ids)
function only accepts dense, rectangular tensors as the ids
argument. (In general, the same goes for all TensorFlow operators that expect a tf.Tensor
or tensor-like argument such as a NumPy array.)
For sparse data, you can use tf.nn.embedding_lookup_sparse()
, which accepts a tf.SparseTensor
as its argument, which can represent sequences of different lengths. A tf.SparseTensor
is defined from three separate (dense) tensors, representing the indices of the non-zeroes, the values of the non-zeroes, and the overall dense shape. For your example of inputs, the representation would be:
inputs_sparse = tf.SparseTensor(
# The coordinates of the non-zero entries.
indices=tf.constant([[0, 0], [0, 1], [0, 2], [0, 3],
[1, 0], [1, 1]]),
# The values of the respective non-zero entries.
values=tf.constant([0, 1, 2, 3,
3, 2]),
# The shape of the corresponding dense tensor (must be >= [2, 4]).
shape=[2, 4],
)
Upvotes: 2