Bart Friederichs
Bart Friederichs

Reputation: 33553

Postgresql does not accept \0 in UTF string, while C# does

I have some data that includes a \0 byte in it, and that seems to be valid UTF8 data:

using System;
using System.Text;
                    
public class Program
{
    public static void Main()
    {
        byte[] b = new byte[3];
        b[0] = 65;
        b[1] = 66;
        b[2] = 0;
        
        Console.WriteLine(Encoding.UTF8.GetString(b));
    }
}

That code works fine. But, when trying to update a record in Postgres, it complains about it:

22021: invalid byte sequence for encoding "UTF8": 0x00

The data shouldn't be there, but how can it be that one system accepts it, and another doesn't? I reckon they both implement standards.

Upvotes: 1

Views: 624

Answers (1)

Lukasz Szozda
Lukasz Szozda

Reputation: 176094

From documenation 8.3. Character Types

+-----------------------------------+----------------------------+
|               Name                |        Description         |
+-----------------------------------+----------------------------+
| character varying(n), varchar(n)  | variable-length with limit |
| character(n), char(n)             | fixed-length, blank padded |
| text                              | variable unlimited length  |
+-----------------------------------+----------------------------+

The characters that can be stored in any of these data types are determined by the database character set, which is selected when the database is created. Regardless of the specific character set, the character with code zero (sometimes called NUL) cannot be stored. For more information refer to Section 23.3.

Upvotes: 1

Related Questions