Reputation: 19571
I have a database that was set up with the default character set SQL_ASCII. I want to switch it to UNICODE. Is there an easy way to do that?
Upvotes: 103
Views: 248195
Reputation: 880
For docker compose, Here is my working code:
Filename: dc-postgres-adminer.yml
name: docker-postgres-data
services:
postgres:
image: postgres:latest
container_name: postgres
ports:
- 5432:5432
environment:
POSTGRES_PASSWORD: sud100
LANG: en_US.UTF-8
LC_COLLATE: en_US.UTF-8
LC_CTYPE: en_US.UTF-8
volumes:
- docker-postgres-volume:/var/lib/postgresql/data
restart: unless-stopped
command: ["postgres", "-c", "default_transaction_isolation=serializable"]
adminer:
image: adminer:latest
container_name: adminer-postgres
environment:
ADMINER_DEFAULT_SERVER: postgres
restart: unless-stopped
ports:
- 8081:8080
depends_on:
- postgres
volumes:
docker-postgres-volume:
# external: true
driver: local
name: docker-postgres-volume
driver_opts:
type: none
device: /home/sud/SudhakarK/WORK/2025/Dockers/DB/postgres_data
o: bind
docker compose -f dc-postgres-adminer.yml up
HINT: Username: postgres
and Password: sud100
Upvotes: 0
Reputation: 20875
# dump into file
pg_dump myDB > /tmp/myDB.sql
# create an empty db with the right encoding (on older versions the escaped single quotes are needed!)
psql -c 'CREATE DATABASE "tempDB" WITH OWNER = "myself" LC_COLLATE = '\''de_DE.utf8'\'' TEMPLATE template0;'
# import in the new DB
psql -d tempDB -1 -f /tmp/myDB.sql
# rename databases
psql -c 'ALTER DATABASE "myDB" RENAME TO "myDB_wrong_encoding";'
psql -c 'ALTER DATABASE "tempDB" RENAME TO "myDB";'
# see the result
psql myDB -c "SHOW LC_COLLATE"
Once everything runs fine, you can drop the database myDB_wrong_encoding
Upvotes: 10
Reputation: 49
I had the same issue in postgres 11 and I did change the database encoding using the below steps,
to update all the list of encoding
SET client_encoding = 'UTF8';
UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='postgres';
update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'dbname' ;
make sure to apply the update
statment in template0
and template1
and postgres
Database
postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='postgres';
UPDATE 1
postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='template0';
UPDATE 1
postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='template1';
UPDATE 1
Upvotes: 4
Reputation: 7093
To change the encoding of your database:
Make sure the client encoding is set correctly during all this.
Source: http://archives.postgresql.org/pgsql-novice/2006-03/msg00210.php
Upvotes: 79
Reputation: 118
Daniel Kutik's answer is correct, but it can be even more safe, with database renaming.
So, the truly safe way is:
In case of emergency, just rename DBs back
Upvotes: 9
Reputation: 5349
Dumping a database with a specific encoding and try to restore it on another database with a different encoding could result in data corruption. Data encoding must be set BEFORE any data is inserted into the database.
Check this : When copying any other database, the encoding and locale settings cannot be changed from those of the source database, because that might result in corrupt data.
And this : Some locale categories must have their values fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore. LC_COLLATE and LC_CTYPE are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns would become corrupt. (But you can alleviate this restriction using collations, as discussed in Section 22.2.) The default values for these categories are determined when initdb is run, and those values are used when new databases are created, unless specified otherwise in the CREATE DATABASE command.
I would rather rebuild everything from the begining properly with a correct local encoding on your debian OS as explained here :
su root
Reconfigure your local settings :
dpkg-reconfigure locales
Choose your locale (like for instance for french in Switzerland : fr_CH.UTF8)
Uninstall and clean properly postgresql :
apt-get --purge remove postgresql\*
rm -r /etc/postgresql/
rm -r /etc/postgresql-common/
rm -r /var/lib/postgresql/
userdel -r postgres
groupdel postgres
Re-install postgresql :
aptitude install postgresql-9.1 postgresql-contrib-9.1 postgresql-doc-9.1
Now any new database will be automatically be created with correct encoding, LC_TYPE (character classification), and LC_COLLATE (string sort order).
Upvotes: 16
Reputation: 133732
First off, Daniel's answer is the correct, safe option.
For the specific case of changing from SQL_ASCII to something else, you can cheat and simply poke the pg_database catalogue to reassign the database encoding. This assumes you've already stored any non-ASCII characters in the expected encoding (or that you simply haven't used any non-ASCII characters).
Then you can do:
update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'thedb'
This will not change the collation of the database, just how the encoded bytes are converted into characters (so now length('£123')
will return 4 instead of 5). If the database uses 'C' collation, there should be no change to ordering for ASCII strings. You'll likely need to rebuild any indices containing non-ASCII characters though.
Caveat emptor. Dumping and reloading provides a way to check your database content is actually in the encoding you expect, and this doesn't. And if it turns out you did have some wrongly-encoded data in the database, rescuing is going to be difficult. So if you possibly can, dump and reinitialise.
Upvotes: 119