Trec Apps
Trec Apps

Reputation: 273

Rust println! prints weird characters under certain circumstances

I'm trying to write a short program (short enough that it has a simple main function). First, I should list the dependency in the cargo.toml file:

[dependencies]

passwords = {version = "3.1.3", features = ["crypto"]}

Then when I use the crate in main.rs:

extern crate passwords;

use passwords::hasher;

fn main() {
    let args: Vec<String> = std::env::args().collect();

    if args.len() < 2
    {
        println!("Error! Needed second argument to demonstrate BCrypt Hash!");
        return;
    }

    let password = args.get(1).expect("Expected second argument to exist!").trim();

    let hash_res = hasher::bcrypt(10, "This_is_salt", password);

    match hash_res
    {
        Err(_) => {println!("Failed to generate a hash!");},
        Ok(hash) => { 
            let str_hash = String::from_utf8_lossy(&hash);
            println!("Hash generated from password {} is {}", password, str_hash);
        }
    }
}

The issue arises when I run the following command:

$ target/debug/extern_crate.exe trooper1

And this becomes the output:

?sC�M����k��ed from password trooper1 is ���Ka .+:�

However, this input:

$ target/debug/extern_crate.exe trooper3

produces this:

Hash generated from password trooper3 is ��;��l�ʙ�Y1�>R��G�Ѡd

I'm pretty content with the second output, but is there something within UTF-8 that could cause the "Hash generat" portion of the output statement to be overwritten? And is there code I could use to prevent this?

Note: Code was developed in Visual Studio Code in Windows 10, and was compiled and run using an embedded Git Bash Terminal.

P.S.: I looked at similar questions such as Rust println! problem - weird behavior inside the println macro and Why does my string not match when reading user input from stdin? but those issues seem to be issues with new-line and I don't think that's the problem here.

Upvotes: 1

Views: 1717

Answers (3)

ais523
ais523

Reputation: 998

Neither of the other answers so far have covered what caused the Hash generated part of the answer to get overwritten.

Presumably you were running your program in a terminal. Terminals support various "terminal control codes" that give the terminal information such as which formatting they should use to output the text they're showing, and where the text should be output on the screen. These codes are made out of characters, just like strings are, and Unicode and UTF-8 are capable of representing the characters in question – the only difference from "regular" text is that the codes start with a "control character" rather than a more normal sort of character, but control characters have UTF-8 encodings of their own. So if you try to print some randomly generated UTF-8, there's a chance that you'll print something that causes the terminal to do something weird.

There's more than one terminal control code that could produce this particular output, but the most likely possibility is that the hash contained the byte b'\x0D', which UTF-8 decodes as the Unicode character U+000D. This is the terminal control code "CR", which means "print subsequent output at the start of the current line, overwriting anything currently there". (I use this one fairly frequently for printing progress bars, getting the new version of the progress bar to overwrite the old version of the progress bar.) The output that you posted is consistent with accidentally outputting CR, because some random Unicode full of replacement characters ended up overwriting the start of the line you were outputting – and because the code in question is only one byte long (most terminal control codes are much longer), the odds that it might appear in randomly generated UTF-8 are fairly high.

The easiest way to prevent this sort of thing happening when outputting arbitrary UTF-8 in Rust is to use the Debug implementation for str/String rather than the Display implementation – it will output control codes in escaped form rather than outputting them literally. (As the other answers say, though, in the case of hashes, it's usual to print them as hex rather than trying to interpret them as UTF-8, as they're likely to contain many byte sequences that aren't valid UTF-8.)

Upvotes: 1

Masklinn
Masklinn

Reputation: 42197

To complement the previous, the answer to your question of "is there something within UTF-8 that could cause the "Hash generat" portion of the output statement to be overwritten?" is:

let str_hash = String::from_utf8_lossy(&hash);

The reason's in the name: from_utf8_lossy is lossy. UTF8 is a pretty prescriptive format. You can use this function to "decode" stuff which isn't actually UTF8 (for whatever reason), but the way it will do this decoding is:

replace any invalid UTF-8 sequences with U+FFFD REPLACEMENT CHARACTER, which looks like this: �

And so that is what the odd replacement you get is: byte sequences which can not be decoded as UTF8, and are replaced by the "replacement character".

And this is because hash functions generally return random-looking binary data, meaning bytes across the full range (0 to 255) and with no structure. UTF8 is structured and absolutely does not allow such arbitrary data so while it's possible that a hash will be valid UTF8 (though that's not very useful) the odds are very very low.

That's why hashes (and binary data in general) are usually displayed in alternative representations e.g. hex, base32 or base64.

Upvotes: 2

Julian
Julian

Reputation: 2822

You could convert the hash to hex before printing it to prevent this

Upvotes: 1

Related Questions