Reputation: 16666
I wrote two functions in PHP, str_to_utf8()
and seems_utf8()
(Well they are comprised of parts I borrowed from other code). Now I'm writing unit tests for them and I want to make sure I have proper unit tests. I current took the ones I have from Facebook:
public function test_str_to_utf8()
{
// Make sure ASCII characters are ignored
$this->assertEquals( "this\x01 is a \x7f test string", str_to_utf8( "this\x01 is a \x7f test string" ) );
// Make sure UTF8 characters are ignored
$this->assertEquals( "\xc3\x9c \xc3\xbc \xe6\x9d\xb1!", str_to_utf8( "\xc3\x9c \xc3\xbc \xe6\x9d\xb1!" ) );
// Test long strings
#str_to_utf8( str_repeat( 'x', 1024 * 1024 ) );
$this->assertEquals( TRUE, TRUE );
// Test some invalid UTF8 to see if it is properly fixed
$input = "\xc3 this has \xe6\x9d some invalid utf8 \xe6";
$expect = "\xEF\xBF\xBD this has \xEF\xBF\xBD\xEF\xBF\xBD some invalid utf8 \xEF\xBF\xBD";
$this->assertEquals( $expect, str_to_utf8( $input ) );
}
Are those valid test cases?
Upvotes: 1
Views: 742
Reputation: 1122
I find this resource useful when testing UTF-8.
If you use any of the non-latin-1 text, you'll need to either ensure your PHP file is saved as UTF-8, or pre-escape them
Upvotes: 1