pablo
pablo

Reputation: 105

What are these strange characters in HTML source?

My friend runs a website and had an e-mail from Google Safesearch informing him he was hosting a phishing page. Turns out his cPanel was bruteforced (weak password) and they uploaded some of the pages onto his server. He told me about it and I wanted to take a look at how sophisticated are.

In many of the files, certain words/portions of text are strange. They display perfectly in a webbrowser, but are jumbled inside the HTML. I was wondering if anyone can tell me what this is?

Examples:

<title>WеlÑоmе tо еВаy: Sign in</title>
<span class="txtbox_title">Раsswоrd</span>
<a class="three" href="#">Fоrgоt yоur 

It's also worth noting that there is normal text throughout the page that displays perfectly also.

I assume this is to stop the detection of certain words in the page, but I'm not sure. Any information would be great.

Edit: Originally was tagged as PHP. I realised that it probably shouldn't be so removed it. Be nice, kids.

Edit edit: For clarity, it's a phishing page targetting eBay users.

The examples I posted in the original post are (in order):

eBay: Sign In
Your Password
Forgot your [password]

As such I don't believe it to be any sort of malware, but a method of encrypting text to fight detection in browsers such as Chrome (which I assume detect 'hot' words in their algorithm).

Upvotes: 4

Views: 609

Answers (2)

Jukka K. Korpela
Jukka K. Korpela

Reputation: 201886

They UTF-8 encoded Cyrillic letters and possibly other characters chosen for their visual similarity to common Latin letters. You are viewing the page in an editor that does not interpret data as UTF-8 but as in Latin 1 encoding.

For example, what you see as “о” is actually two bytes, 0xD0 0xBE. When interpreted as UTF-8 data (which is what browsers do here), they represent “о” U+043E CYRILLIC SMALL LETTER O. It is identical with the common Latin letter “o” in visual appearance (in any font that contains both letters), but coded as a separate character due to belonging to a different writing system. To any program, they are quite distinct characters, unless the program has been separately coded to handle “confusables”.

Such confusion is often intentionally created for various reasons. You are probably right in assuming that here the purpose was “to stop the detection of certain words in the page”. When e.g. “Forgot” is written using Cyrillic o’s (Fоrgоt), normal Find operations will find it when searching for “Forgot”.

Upvotes: 1

Roger
Roger

Reputation: 3256

My best guess is that there it is a custom type of keylogger. The WеlÑоmе tо еВаywould be parsed by the keylogger to output some data into a database that can be mined later for important information.

My second guess is that it is a means to scare or mess with the person whom owns the site.

My third guess is that the virus was coded by china or some other language and when the code was translated back into utf-8 it resulted in some of the unused characters to output the strange content.

EDIT


My fith guess is the the phishing website was programmatic getting the source code content of the ebay site and parsing it into it's own html file. And ebay has its own countermeasures against such a type of attack by scrambling the letter in the source code.

With this there must be some type of javascript that undoes the effects of the original source code.

Upvotes: 1

Related Questions