Reputation: 409
I am building a classification model on text data into two categories(i.e. classifying each comment into 2 categories) using GloVe word embeddings. I have two columns, one with textual data(comments) and the other one is a binary Target variable(whether a comment is actionable or not). I was able to generate Glove word embeddings for textual data using the following code from text2vec documentation.
glove_model <- GlobalVectors$new(word_vectors_size = 50,vocabulary =
glove_pruned_vocab,x_max = 20L)
#fit model and get word vectors
word_vectors_main <- glove_model$fit_transform(glove_tcm,n_iter = 20,convergence_tol=-1)
word_vectors_context <- glove_model$components
word_vectors <- word_vectors_main+t(word_vectors_context)
How do i build a model and generate predictions on test data?
Upvotes: 5
Views: 2533
Reputation: 409
Got it.
glove_model <- GlobalVectors$new(word_vectors_size = 50,vocabulary =
glove_pruned_vocab,x_max = 20L)
#fit model and get word vectors
word_vectors_main <- glove_model$fit_transform(glove_tcm,n_iter =20,convergence_tol=-1)
word_vectors_context <- glove_model$components
word_vectors <- word_vectors_main+t(word_vectors_context)
After creating word embeddings, build an index that maps words(strings) to their vector representations(numbers)
embeddings_index <- new.env(parent = emptyenv())
for (line in lines) {
values <- strsplit(line, ' ', fixed = TRUE)[[1]]
word <- values[[1]]
coefs <- as.numeric(values[-1])
embeddings_index[[word]] <- coefs
}
Next, build an embedding matrix of shape (max_words,embedding_dim) which can be loaded into an embedding layer.
embedding_dim <- 50 (number of dimensions you wish to represent each word).
embedding_matrix <- array(0,c(max_words,embedding_dim))
for(word in names(word_index)){
index <- word_index[[word]]
if(index < max_words){
embedding_vector <- embeddings_index[[word]]
if(!is.null(embedding_vector)){
embedding_matrix[index+1,] <- embedding_vector #words not found in
the embedding index will all be zeros
}
}
}
We can then load this embedding matrix into the embedding layer, build a
model and then generate predictions.
model_pretrained <- keras_model_sequential() %>% layer_embedding(input_dim = max_words,output_dim = embedding_dim) %>%
layer_flatten()%>%layer_dense(units=32,activation = "relu")%>%layer_dense(units = 1,activation = "sigmoid")
summary(model_pretrained)
#Loading the glove embeddings in the model
get_layer(model_pretrained,index = 1) %>%
set_weights(list(embedding_matrix)) %>% freeze_weights()
model_pretrained %>% compile(optimizer = "rmsprop",loss="binary_crossentropy",metrics=c("accuracy"))
history <-model_pretrained%>%fit(x_train,y_train,validation_data = list(x_val,y_val),
epochs = num_epochs,batch_size = 32)
Then use standard predict function to generate predictions.
Check the following links. Use word embeddings to build a model in Keras
Upvotes: 0
Reputation: 1687
text2vec
has a standard predict
method (like most of the R
libraries anyway) that you can use in a straightforward fashion: have a look at the documentation.
To make a long story short, just use
predictions <- predict(fitted_model, data)
Upvotes: 1