What are these strange characters in HTML source?

Question

My friend runs a website and had an e-mail from Google Safesearch informing him he was hosting a phishing page. Turns out his cPanel was bruteforced (weak password) and they uploaded some of the pages onto his server. He told me about it and I wanted to take a look at how sophisticated are.

In many of the files, certain words/portions of text are strange. They display perfectly in a webbrowser, but are jumbled inside the HTML. I was wondering if anyone can tell me what this is?

Examples:

WÐµlÑÐ¾mÐµ tÐ¾ ÐµÐ’Ð°y: Sign in
Ð Ð°sswÐ¾rd
FÐ¾rgÐ¾t yÐ¾ur

It's also worth noting that there is normal text throughout the page that displays perfectly also.

I assume this is to stop the detection of certain words in the page, but I'm not sure. Any information would be great.

Edit: Originally was tagged as PHP. I realised that it probably shouldn't be so removed it. Be nice, kids.

Edit edit: For clarity, it's a phishing page targetting eBay users.

The examples I posted in the original post are (in order):

eBay: Sign In
Your Password
Forgot your [password]

As such I don't believe it to be any sort of malware, but a method of encrypting text to fight detection in browsers such as Chrome (which I assume detect 'hot' words in their algorithm).

Jukka K. Korpela · Accepted Answer

They UTF-8 encoded Cyrillic letters and possibly other characters chosen for their visual similarity to common Latin letters. You are viewing the page in an editor that does not interpret data as UTF-8 but as in Latin 1 encoding.

For example, what you see as “Ð¾” is actually two bytes, 0xD0 0xBE. When interpreted as UTF-8 data (which is what browsers do here), they represent “о” U+043E CYRILLIC SMALL LETTER O. It is identical with the common Latin letter “o” in visual appearance (in any font that contains both letters), but coded as a separate character due to belonging to a different writing system. To any program, they are quite distinct characters, unless the program has been separately coded to handle “confusables”.

Such confusion is often intentionally created for various reasons. You are probably right in assuming that here the purpose was “to stop the detection of certain words in the page”. When e.g. “Forgot” is written using Cyrillic o’s (Fоrgоt), normal Find operations will find it when searching for “Forgot”.

What are these strange characters in HTML source?

Answers (2)

Related Questions