Reputation: 67
So, I am following this example:
https://keras.io/examples/nlp/pretrained_word_embeddings/
In this example, an embedding matrix is being generated in following secti
num_tokens = len(voc) + 2
embedding_dim = 100
hits = 0
misses = 0
# Prepare embedding matrix
embedding_matrix = np.zeros((num_tokens, embedding_dim))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# Words not found in embedding index will be all-zeros.
# This includes the representation for "padding" and "OOV"
embedding_matrix[i] = embedding_vector
hits += 1
else:
misses += 1
print("Converted %d words (%d misses)" % (hits, misses))
How can this be pushed to cassandra and hive. I have tried following query:
statement = "CREATE TABLE schema.upcoming_calendar3 ( embedding_matrix list<frozen<set>>, PRIMARY KEY ( embedding_matrix) );"
However, that gives me following error:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid non-frozen collection type for PRIMARY KEY component embedding_matrix"
Similarly, I wanna send that to hive as well.
Any help on what data type would be used in cassandra and hive would be great along with more efficient way of sending it to the DB.
Currently, I am pushing data like this:
statement = "insert into schema.upcoming_calendar3(embedding_matrix) values (%s);" % (embedding_matrix)
Upvotes: 1
Views: 274