Reputation: 13
Problem Statement:
I am currently working on Aspect-Based Sentiment Analysis, where the objective is to analyze changing sentiment trends within a sentence by employing temporal windows. Ultimately, I aim to develop a contrastive learning model. To achieve this, I am utilizing a pre-trained RoBERTa transformer model along with attention mechanisms.
For instance, given the sentence: "Battery life is good, but camera is very bad"
With the aspects being "Battery life" and "camera," the dataset is structured as follows:
Index | Sentence | Aspect | Polarity |
---|---|---|---|
1 | "Battery life is good, but camera is very bad" | Battery Life | Positive |
2 | "Battery life is good, but camera is very bad" | Camera | Negative |
To obtain sentence embeddings, I employ a pre-trained RoBERTa transformer model. Then, I intend to pass these embeddings through an attention mechanism to obtain embeddings that are 'aspect-aware'. In essence, I aim for the attention mechanism to discern between words in the sentence based on their respective aspects - how each word's relative importance differs according to the aspects (index 1 sentence word scores will be different from index 2 word scores; each will be optimized according to the aspect)
Tokenizers used:
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = RobertaModel.from_pretrained('roberta-base')
class MultiheadAttentionWithAspect(nn.Module):
def __init__(self, input_dim, d_model, aspect_embedding_dim, num_heads):
super(MultiheadAttentionWithAspect, self).__init__()
self.input_dim = input_dim
self.d_model = d_model
self.num_heads = num_heads
self.aspect_embedding_dim = aspect_embedding_dim
self.qkv_layer = nn.Linear(d_model, 3 * d_model) # Changed input_dim to d_model
self.aspect_linear = nn.Linear(aspect_embedding_dim, d_model) # Linear layer for aspect embeddings
self.linear_layer = nn.Linear(d_model, d_model)
def forward(self, x, aspect_embeddings, mask=None):
batch_size, sequence_length, input_dim = x.size()
aspect_embeddings_expanded = self.aspect_linear(aspect_embeddings).repeat(1, sequence_length, 1)
qkv = self.qkv_layer(x)
q, k, v = qkv.chunk(3, dim=-1)
output_features = d_model // num_heads # Calculate desired output features
aspect_embeddings_projected = F.linear(aspect_embeddings_expanded, self.aspect_linear.weight)
aspect_weights = F.sigmoid(aspect_embeddings_projected) # Sigmoid activation for gating
x_with_aspect = x * aspect_weights
k_concat = k + aspect_embeddings_projected # Element-wise addition
v_concat = v + aspect_embeddings_projected
values, attention = scaled_dot_product(x_with_aspect, k_concat, v_concat, mask)
values = F.dropout(values, p=0.1, training=self.training)
output = self.linear_layer(values)
return output, attention
This class is called like this:
input_dim = 768 # Dimensionality of input embeddings
d_model = 768 # Dimensionality of the model
num_heads = 8 # Number of attention heads
batch_size = 32
sequence_length = 398
aspect_embedding_dim = 768
x = sentence_embeddings
# Initialize encoder self-attention module
encoder_self_attention = MultiheadAttentionWithAspect(input_dim, d_model,aspect_embedding_dim, num_heads)
# Forward pass
output,attention = encoder_self_attention(x,aspect_embeddings,mask)
print("Attention:", attention)
Issue is, for the following sentence and aspect, I retrieve the sentence embeddings and aspect embeddings, and then when i retrieve my attention scores for the sentence, all values are the same:
sentence = "The food at this restaurant is delicious."
aspect = "food"
max_seq_length= 400 #longest sentence in df
sentence_embeddings,aspect_embeddings, input_ids = tokenize_and_contextualize(sentence, aspect, tokenizer, model, max_seq_length)
# Initialize encoder self-attention module
encoder_self_attention = MultiheadAttentionWithAspect(input_dim, d_model,aspect_embedding_dim, num_heads)
# Forward pass
output,attention = encoder_self_attention(x,aspect_embeddings,mask)
Attention Scores when printed out:
Attention Scores: tensor([[[0.0026, 0.0025, 0.0026, ..., 0.0025, 0.0025, 0.0025],
[0.0026, 0.0024, 0.0025, ..., 0.0025, 0.0025, 0.0025],
[0.0026, 0.0024, 0.0025, ..., 0.0025, 0.0025, 0.0025],
...,
[0.0027, 0.0026, 0.0026, ..., 0.0025, 0.0025, 0.0025],
[0.0027, 0.0026, 0.0026, ..., 0.0025, 0.0025, 0.0025],
[0.0027, 0.0026, 0.0026, ..., 0.0025, 0.0025, 0.0025]]],
grad_fn=<SoftmaxBackward0>)
Output:
Output: tensor([[[-0.1860, 0.0446, -0.0739, ..., -0.0274, 0.0770, 0.0683],
[-0.1819, 0.0725, 0.0731, ..., -0.0391, 0.0886, 0.0440],
[-0.2937, -0.0031, -0.0294, ..., 0.1002, 0.1164, 0.0995],
...,
[-0.1288, 0.1084, -0.1185, ..., 0.2159, 0.1058, 0.1133],
[-0.2388, 0.1785, -0.0160, ..., 0.0545, 0.1429, 0.0658],
[-0.2878, -0.0587, -0.0592, ..., 0.0542, 0.1209, 0.0370]]],
grad_fn=<ViewBackward0>)
Any help on the following will be helpful:
Is the attention mechanism architecture alright?
Any insight on why are all the attention scores the same?
A better way to inject the 'aspect' information into the sentence embedding instead of concatenating them
Improvements to the current model keeping in mind the goal is to learn multi-aspects in a sentence
Thank you for your help.
I expect words relating to the aspect, and the aspect itself will get high scores.
Upvotes: 0
Views: 60