dfrankow
dfrankow

Reputation: 21459

What are the tensorflow hash functions?

Here is the page describing TensorFlow's tf.string_to_hash_bucket_fast. (Version is currently 1.3.) It says the file that defines this function is tensorflow/python/ops/gen_string_ops.py, which doesn't seem to exist on github. The gen might mean it's generated. Okay.

What is the solid definition of this function (i.e. so I could reimplement it if I wanted on another platform)?

Upvotes: 3

Views: 4043

Answers (3)

shuaiyuancn
shuaiyuancn

Reputation: 2794

Just to build on top of all previous answers: ultimately you end up here by following imports and includes: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hash.cc

Upvotes: 0

Lerner Zhang
Lerner Zhang

Reputation: 7140

Yes, that file is generated once you install Tensorflow and hence you can find it in your machine in that path. For me it is located here:

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_string_ops.py

PS: You can see such paths when you encounter errors.

The solid definition is:

def string_to_hash_bucket(string_tensor, num_buckets, name=None):
  r"""Converts each string in the input Tensor to its hash mod by a number of buckets.

  The hash function is deterministic on the content of the string within the
  process.

  Note that the hash function may change from time to time.
  This functionality will be deprecated and it's recommended to use
  `tf.string_to_hash_bucket_fast()` or `tf.string_to_hash_bucket_strong()`.

  Args:
    string_tensor: A `Tensor` of type `string`.
    num_buckets: An `int` that is `>= 1`. The number of buckets.
    name: A name for the operation (optional).

  Returns:
    A `Tensor` of type `int64`.
    A Tensor of the same shape as the input `string_tensor`.
  """
  result = _op_def_lib.apply_op("StringToHashBucket",
                                string_tensor=string_tensor,
                                num_buckets=num_buckets, name=name)
  return result

You can track what you want under /usr/local/lib/python2.7/dist-packages/(it varies depending on your setting). Absolutely that Python definition is not the true definition, and the true one is the C++ one articulated in the previous answer.

Upvotes: 1

gunxueqiu
gunxueqiu

Reputation: 21

Based on the registration information from string_to_hash_bucket_op.cc. I think the implementation for tf.string_to_hash_bucket_fast is in StringToHashBucketOp class from corresponding .h file.

Upvotes: 2

Related Questions