john johnson
john johnson

Reputation: 730

Removing non-ascii characters on utf-16 (Python)

i have some code i'm using to decrypt a string, the string is originally encrypted and coming from .net source code but i'm able to make it all work fine. yet, the string coming into python has some extra characters in it and it has to decode as utf-16

here is some code for the decryption portion. my original string that i encrypted was "test2" , which is what is within the text variable in my code below.

import Crypto.Cipher.AES
import base64, sys

password = base64.b64decode('PSCIQGfoZidjEuWtJAdn1JGYzKDonk9YblI0uv96O8s=') 
salt = base64.b64decode('ehjtnMiGhNhoxRuUzfBOXw==') 
aes = Crypto.Cipher.AES.new(password, Crypto.Cipher.AES.MODE_CBC, salt)
text = base64.b64decode('TzQaUOYQYM/Nq9f/pY6yaw==')

print(aes.decrypt(text).decode('utf-16'))
text1 = aes.decrypt(text).decode('utf-16')
print(text1)

my issue is when i decrypt and print the result of text it is "test2ЄЄ" instead of the expected "test2"

if i save the same decrypt value into a variable it gets decoded incorrectly as "틊첃陋ភ滑毾穬ヸ"

my goal is i need to find a way to :

  1. strip off the non ascii characters from the end of test2 value
  2. be able to store that into a variable holding the correct string/text value

any help or suggestions appreciated? thanks

Upvotes: 0

Views: 478

Answers (1)

cs95
cs95

Reputation: 403120

In python 2, you can use str.decode, like this:

string.decode('ascii', 'ignore')

The locale is ascii, and ignore specifies that anything that cannot be converted is to be dropped.

In python 3, you'll need to re-encode it first before decoding, since all str objects are decoded to your locale by default:

string.encode('ascii', 'ignore').decode()

Upvotes: 2

Related Questions