Postgres - character sets and encodings

Question

I was wondering if someone can help me understand what's going on/wrong with my Postgres data please...

I'll explain things below - but I guess ultimately the questions I have are :

What characterset/encoding should I be using (i.e. what is best practise)?
IF the answer is UTF8, then will certain characters (e.g. UK pound symbols) always look "funny" in the database?

I've got a database that has a table with data about flights (although obviously it could be anything really), defined as follows...

CREATE TABLE public.flight (
    flightid integer DEFAULT nextval('public.flight_seq'::regclass) NOT NULL,
    tripid integer NOT NULL,
    flightdatedeparted date NOT NULL,
    flightairportdeparted text NOT NULL,
    flightairportarrived text NOT NULL,
    flightairline text NOT NULL,
    flightdetails text,
    flightdayflightnumber integer DEFAULT 1 NOT NULL,
    flightdistance numeric
);

Now, when I enter data into it via a web front end connected to this database then I end up with data something like...

holidayinfo=# select distinct * from flight where flightid=97;
-[ RECORD 1 ]---------+---------------------------------------
flightid              | 97
tripid                | 36
flightdatedeparted    | 2004-05-14
flightairportdeparted | LHR
flightairportarrived  | WAW
flightairline         | British Airways
flightdetails         | Hotline, ┬ú82.40, BA850, 13:40 -> 17:05
flightdayflightnumber | 1
flightdistance        | 912.7

However, the data that I'd entered into the web form for the field "flightdetails" was actually...

Hotline, £82.40, BA850, 13:40 -> 17:05

Now, when I dump the data and look at it in Notepad++, depending on what encoding I use then sometimes I see it correctly as the pound symbol (when I choose ANSI) and other times it's incorrect as xA3 (when I choose UTF8).

At least when it's stored in Postgres as the "funny" value then it also displays correctly on my webpage when I retrieve the data - so that's good.

If I try to manually update the value via psql then I get the following...

holidayinfo=# update flight set flightdetails='Hotline, £82.40, BA850, 13:40 -> 17:05' where flightid=97;
ERROR:  invalid byte sequence for encoding "UTF8": 0x9c

In terms of how my database is created and what client encoding its using then I've got the following...

holidayinfo=# \l
                                                   List of databases
    Name     |  Owner   | Encoding |           Collate           |            Ctype            |   Access privileges
-------------+----------+----------+-----------------------------+-----------------------------+-----------------------
 holidayinfo | postgres | UTF8     | English_United Kingdom.1252 | English_United Kingdom.1252 |
 leagueinfo  | postgres | UTF8     | English_United Kingdom.1252 | English_United Kingdom.1252 |
 postgres    | postgres | UTF8     | English_United Kingdom.1252 | English_United Kingdom.1252 |
 template0   | postgres | UTF8     | English_United Kingdom.1252 | English_United Kingdom.1252 | =c/postgres          +
             |          |          |                             |                             | postgres=CTc/postgres
 template1   | postgres | UTF8     | English_United Kingdom.1252 | English_United Kingdom.1252 | =c/postgres          +
             |          |          |                             |                             | postgres=CTc/postgres
(5 rows)


holidayinfo=# show client_encoding;
 client_encoding
-----------------
 UTF8
(1 row)

Maybe this is all working as designed, but I'm just confused as to how things should be?

Ultimately, I'd love to be able to have the data stored so that I can see it as the pound sign AND be entered/retrieved/displayed as the pound sign.

The former is desirable so that if ever I need to look at the data then I can see what the real data is - not have to make assumptions on what character "┬ú" really means.

Also, this problem scales up when there are other characters having the same "issue" such as a hyphen (-) showing as "ÔÇô" and or an apostrophe (') showing as "ÔÇÖ".

Thanks in advance!

Postgres - character sets and encodings

Answers (1)

Related Questions