Amir
Amir

Reputation: 16607

SVD in Tensorflow

I want to create vector representation from text8 Corpus with SVD (Singular Value Decomposition) in Tensorflow. I used following piece of code but it not taken number of dimension:

u,s,v = tf.svd(coocurrence_matrix)

I need something like TruncatedSVD in scikit-learn. What should I do? Is it possible to do the same things in Tensorflow?

Upvotes: 1

Views: 1959

Answers (1)

bicepjai
bicepjai

Reputation: 1665

I take it as you are working the first assignment from cs20si. Form the co-occurrence matrix that is of any dimension you want, say (1000,1000). Once you have the words (a list) and the dictionary that maps words to indices, you can use ndarray to form concurrent matrix like

cooccurrence_matrix = np.zeros((VOCAB_SIZE, VOCAB_SIZE))
n_words = len(words)
for i, current_word in enumerate(words):
    if current_word not in dictionary:
        current_word = 'UNK'
    if i != 0:
        left_word = words[i-1]
        if left_word not in dictionary:
            left_word = 'UNK'
        cooccurrence_matrix[dictionary[current_word]][dictionary[left_word]] += 1
    if i < n_words-1:
        right_word = words[i+1]
        if right_word not in dictionary:
            right_word = 'UNK'
        cooccurrence_matrix[dictionary[current_word]][dictionary[right_word]] += 1

print cooccurrence_matrix.shape

After that you can just use tf.svd directly as it takes just a tensor.

tf_svd = tf.svd(matrix, compute_uv=True)
with tf.Session() as sess:
      sess.run(tf.global_variables_initializer())
      svd, u, v = sess.run(tf_svd, feed_dict={matrix:cooccurrence_matrix})

output of tf.svd will have three values as mentioned in the tf.svd documentation. I would start with dictionary size 100 to see if things are going alright.

Upvotes: 1

Related Questions