srswat
srswat

Reputation: 165

Implementation of Pass2Edit to model string edit behaviour

I am trying to implement Pass2Edit (this paper: read 3.1, 3.2). It takes in original password and current password strings, and tries to model the edit behaviour. The following is what it looks like:

enter image description here enter image description here

The input of the neural network is the password pair, and the output is the probability of each transformation state. From the paper I understand that the model works by:

  1. Firstly, the input passes through the embedding layer, and each one-hot encoded password character is converted into a 256-dimensional vector (i.e., v_origi and v_curi )
  2. Next, concatenate v_origi and v_curi into vi and then input it to a 3-layer GRU (the hidden layer dimension is 256)
  3. Finally, take the output of the GRU for the last character through a 2-layer FC (i.e., fully connected layer, where the hidden layer dimension is 512), and finally obtain the probability of each transformation ti through the softmax layer.

Specifically, after each password is transformed into a key sequence, the character set Σ includes 48 types of characters that can be entered through the EN-US standard keyboard, as well as <shift>, <caps> and <placeholder> (48+3=51). If we limit the length of the password to no more than 30 (i.e., 0≤p<30), then the total number of atomic operations is |t|=30∗51+30+1=1, 561, where 30∗51 is the category # of insertions, 30 is the category # of deletions, and 1 represents the EOS operation. In this light, our one-step prediction process can essentially be seen as a 1,561-class multi-classification problem.

I am very new to writing a RNN model, and am not able to translate this to the pytorch GRU implementation.

Specifically:

  1. Since the dataset contains variable length password pairs and they have limited the password length to 30, does that mean when l<30, the rest of the GRU units just do not engage?
  2. Same goes for the final number of classes I have for prediction. Since the model assumes it as a 1561 class prediction problem, there are classes that are just irrelevant for l<30. For example the class INS(14, "a") when password length is 8.
  3. How do I incorporate the caps key, shift key and "placeholders" they mention in the paper?

An outline of the model, some clarity on how l<30 passwords will work and a way to put in caps key, shift key and "placeholders" would be really helpful. Thanks!

Upvotes: 0

Views: 59

Answers (1)

Odin
Odin

Reputation: 11

I am also looking at this paper, but I am not good at writng RNN too.

1: I think each string have an EOS represents the end of the string, the RNN will only make use of information before EOS

2: 1561 is calculated by: each position has 51 possibilities(US-Keyboard+Shift+Caps+Placeholder) for 1 insertion,and you have 30 positions(since length maximum =30). Also, you have 30 positions for deletion, which is 30, and last one for EOS, which is 1, then it comes to 1561. Assume that you have a password have length = 10, the class that related to position 11 to 30 just won't be activated(if it is predicting correctly).

  1. For the Caps, Shift and Placeholder, you might refer to another prior work: https://ieeexplore.ieee.org/abstract/document/8835247?casa_token=Me_vgHsZvI4AAAAA:a2uw0Skz7iCjUbko8x5i_dVxouayxCpbZ5imdLRctc_j2D23Wvr6KmW7o1v53dJ8LcOhqSY

Upvotes: 1

Related Questions