dan
dan

Reputation: 1050

XOR File Decryption

So I have to decrypt a .txt file that is crypted with XOR code and with a repeated password that is unknown, and the goal is to discover the message.

Here are the things that I already know because of the professor:

  1. First I need to find the length of the unknown password

  2. The message has been altered and it doesn't have spaces (this may add a bit more difficulty because the space character has the highest frequency in a message)

Any ideas on how to solve this?

thx in advanced :)

Upvotes: 4

Views: 7868

Answers (4)

Sani Huttunen
Sani Huttunen

Reputation: 24385

First you need to find out the length of the password. You do this by assessing the Index of Coincidence or Kappa-test. XOR the ciphertext with itself shifted 1 step and count the number of characters that are the same (value 0). You get the Kappa value by dividing the result with the total number of characters minus 1. Shift one more time and again calculate the Kappa value. Shift the ciphertext as many times as needed until you discover the password length. If the length is 4 you should see something similar to this:

Offset             Hits
-------------------------
  1              2.68695%
  2              2.36399%
  3              3.79009%
  4              6.74012%
  5              3.6953%
  6              1.81582%
  7              3.82744%
  8              6.03504%
  9              3.60273%
 10              1.98052%
 11              3.83241%
 12              6.5627%

As you see the Kappa value is significantly higher on multiples of 4 (4, 8 and 12) than the others. This suggests that the length of the password is 4.

Now that you have the password length you should again XOR the cipher text with itself but now you shift by multiples of the length. Why? Since the ciphertext looks like this:

THISISTHEPLAINTEXT    <- Plaintext
PASSPASSPASSPASSPA    <- Password
------------------
EJKELDOSOSKDOWQLAG    <- Ciphertext

When two values which are the same are XOR:ed the result is 0:

EJKELDOSOSKDOWQLAG        <- Ciphertext
    EJKELDOSOSKDOWQLAG    <- Ciphertext shifted 4.

Is in reality:

THISISTHEPLAINTEXT        <- Plaintext
PASSPASSPASSPASSPA        <- Password
    THISISTHEPLAINTEXT    <- Plaintext
    PASSPASSPASSPASSPA    <- Password

Which is:

THISISTHEPLAINTEXT        <- Plaintext
    THISISTHEPLAINTEXT    <- Plaintext

As you see the password "disappears" and the plaintext is XOR:ed with itself.

So what can we do now then? You wrote that the spaces are removed. This makes it a bit harder to get the plaintext or password. But not at all impossible.

The following table shows the ciphertext values for all english characters:

   A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
A  0                                                                           
B  3  0                                                                        
C  2  1  0                                                                     
D  5  6  7  0                                                                  
E  4  7  6  1  0                                                               
F  7  4  5  2  3  0                                                            
G  6  5  4  3  2  1  0                                                         
H  9 10 11 12 13 14 15  0                                                      
I  8 11 10 13 12 15 14  1  0                                                   
J 11  8  9 14 15 12 13  2  3  0                                                
K 10  9  8 15 14 13 12  3  2  1  0                                             
L 13 14 15  8  9 10 11  4  5  6  7  0                                          
M 12 15 14  9  8 11 10  5  4  7  6  1  0                                       
N 15 12 13 10 11  8  9  6  7  4  5  2  3  0                                    
O 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0                                 
P 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  0                              
Q 16 19 18 21 20 23 22 25 24 27 26 29 28 31 30  1  0                           
R 19 16 17 22 23 20 21 26 27 24 25 30 31 28 29  2  3  0                        
S 18 17 16 23 22 21 20 27 26 25 24 31 30 29 28  3  2  1  0                     
T 21 22 23 16 17 18 19 28 29 30 31 24 25 26 27  4  5  6  7  0                  
U 20 23 22 17 16 19 18 29 28 31 30 25 24 27 26  5  4  7  6  1  0               
V 23 20 21 18 19 16 17 30 31 28 29 26 27 24 25  6  7  4  5  2  3  0            
W 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24  7  6  5  4  3  2  1  0         
X 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23  8  9 10 11 12 13 14 15  0      
Y 24 27 26 29 28 31 30 17 16 19 18 21 20 23 22  9  8 11 10 13 12 15 14  1  0   
Z 27 24 25 30 31 28 29 18 19 16 17 22 23 20 21 10 11  8  9 14 15 12 13  2  3  0

What does this mean then? If an A and a B is XOR:ed then the resulting value is 3. E and P will result in 21. Etc. OK but how will this help you?

Remember that the plaintext is XOR:ed with itself shifted by multiples of the password length. For each value you can check the above table and determine what combinations that position could have. Lets say the value is 25 then the two characters that resulted in the value 25 could be one of the following combinations:(I-P), (H-Q), (K-R), (J-S), (M-T), (L-U), (O-V), (N-W), (A-X) or (C-Z). But which one? Now you do more shifts and look up the corresponding values in the table again for each position. Next time the value might be 7 and since you already have a list of possible character combinations you only check against them. At the next two shifts the values are 3 and 1. Now you can determine that the character is W since that is the only common character in each shift, (N-W), (P-W), (T-W), (V-W). You can do this for most positions.

You will not get all the plaintext but you will get enough characters to discover the password. Take the known characters and XOR them in the correct position in the ciphertext. This will yield the password. The number of known characters you need atleast is the number of characters in the password if they are at the "correct" positions in regards to the password.

Good luck!

Upvotes: 13

rossum
rossum

Reputation: 15695

The most common three letter trigram in English (assuming the language is probably English) is "the". Place "the" at all possible points on your cyphertext to derive a possible 3 characters of the key. Try each possible key fragment at all other possible positions on the cyphertext and see what you get. For example, "qzg" is unlikely to be correct, but "fen" could be. Look at the spacing between possible positions to derive the key length. With a key length and a key fragment you can place a lot more of the key.

As Lars said, look at ways of decrypting Vigenère, which is effectively what you have here.

Upvotes: 0

schnaader
schnaader

Reputation: 49729

Although spaces are the most common characters and make decryptions like this easy, the other character also have different frequencies. For example, see this Wikipedia article. If you've got enough encrypted text and the password length isn't too large, it might just be enough to find out the most common bytes in the encrypted text. They will most likely be the encrypted versions of e that has the highest frequency in english texts. This alone won't give you the decrypted text, but it's very likely you can find out the password length and (part of) the password itself with it. For example, let's assume the most frequent encrypted bytes are

w x m z y

with almost the same frequency and there's a significant drop in frequency after the last one. This will tell you two things:

  1. The password length most likely is 5, because statistically, all encrypted e will be equally likely. EDIT: OK, this isn't correct, it will be 5 or above because the password can contain the same character multiple times.
  2. The password will be some permutation of (w x m z y XOR e e e e e) - you can use the byte offsets modulo the password length to get the correct permutation.

EDIT: The same character occuring in the password multiple times makes things a bit harder, but you'll most likely be able to identify those because as I said, encrypted versions of e will cluster around frequency f - now if the character occurs n times, it will have a frequency near n*f.

Upvotes: 1

Lars
Lars

Reputation: 5799

you should look at cracking a vigenere chiffre, especially at auto-correlation. The latter will help you finding out the length of the password and the rest is usually just bruteforcing on the normal distribution of letters (where the most common one is the letter e in the english language).

Upvotes: 1

Related Questions