StackOverflow Questions for Tag: self-attention

sarvenaz
sarvenaz

Reputation: 11

Non deterministic behavior of a CNN network after adding self attention

Score: 1

Views: 38

Answers: 0

Read More
cangozpi
cangozpi

Reputation: 159

Trouble understanding the formula for estimating dense self-attention FLOPS per Token given as 6LH(2QT)

Score: 0

Views: 16

Answers: 0

Read More
BigWinnz101
BigWinnz101

Reputation: 63

Failing to Finalize Execution Plan Using cuDNN Backend to Create a Fused Attention fprop Graph

Score: -4

Views: 59

Answers: 1

Read More
Matija Špeletić
Matija Špeletić

Reputation: 146

How to solve "Torch was not compiled with flash attention" warning?

Score: 7

Views: 24732

Answers: 2

Read More
killermama98
killermama98

Reputation: 45

Receptive Field in Swin Transformer

Score: 1

Views: 232

Answers: 2

Read More
chungking youjun
chungking youjun

Reputation: 1

flash attention gives different result for tokens of identical embeddings?

Score: 0

Views: 90

Answers: 0

Read More
Chiara
Chiara

Reputation: 490

How to read a BERT attention weight matrix?

Score: 3

Views: 4031

Answers: 2

Read More
jun j
jun j

Reputation: 1

How to out put attentions in Transformers BART model

Score: 0

Views: 57

Answers: 1

Read More
phd Mom
phd Mom

Reputation: 11

multihead self-attention for sentiment analysis not accurate results

Score: 1

Views: 52

Answers: 0

Read More
MrGeniusProgrammer
MrGeniusProgrammer

Reputation: 11

cannot back propagate on multi head attention tensorflowjs

Score: 1

Views: 45

Answers: 0

Read More
Rohollah
Rohollah

Reputation: 1

How to handle None values for condition_input in a CrossAttention block in PyTorch?

Score: 0

Views: 52

Answers: 0

Read More
Peter
Peter

Reputation: 9

PyTorch Vision Transformer - How Visualise Attention Layers

Score: 0

Views: 747

Answers: 0

Read More
Wassim Jaoui
Wassim Jaoui

Reputation: 95

Interpreting the rows and columns of the attention Heatmap

Score: 0

Views: 55

Answers: 0

Read More
Wassim Jaoui
Wassim Jaoui

Reputation: 95

Attention Tensor Shape meaning

Score: 1

Views: 74

Answers: 0

Read More
Christina
Christina

Reputation: 1

Can we mask two words at the same time in attention mechanism?

Score: 0

Views: 24

Answers: 0

Read More
user24820368
user24820368

Reputation: 1

How to retrieve query/key/values/output parameters of BERT pretrained?

Score: 0

Views: 116

Answers: 0

Read More
user24606639
user24606639

Reputation: 1

ValueError: Dimension must be 2 but is 3 (keras Attention)

Score: 0

Views: 143

Answers: 0

Read More
sk-19
sk-19

Reputation: 13

Attention Mechanism Scores are the same

Score: 0

Views: 60

Answers: 0

Read More
Razbolt
Razbolt

Reputation: 39

Pytorch- Is it possible to apply self-attention into Faster R-CNN's ROI layer?

Score: 0

Views: 90

Answers: 0

Read More
thejbug
thejbug

Reputation: 41

Can tensorflow optimize the extract_patches operation to avoid large redundant tensors?

Score: 0

Views: 33

Answers: 0

Read More
PreviousPage 1Next