Reputation: 173
I am manipulating Japanese data and in some Japanese words, there are English words and Numbers are in.
SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1
are the examples.
I wanted to convert these English and Numbers in Full-width to half-width by throwing a function or any possible ways.
the output of the input above should be look-like SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1
If anyone knows the best way to start, I would appreciate it.
Upvotes: 0
Views: 1963
Reputation: 1411
Postgres 13 (but not 12) has a normalize()
text function
normalize ( text [, form ] ) → text
Converts the string to the specified Unicode normalization form. The optional
form
key word specifies the form:NFC
(the default),NFD
,NFKC
, orNFKD
. This function can only be used when the server encoding is UTF8.normalize(U&'\0061\0308bc', NFC) → U&'\00E4bc'
Your example results in
SELECT normalize('SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1', NFKD);
normalize
---------------------------------------
SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1
Note that characters such as the ,
, fullwidth space, -
, and &
were also transformed.
The NFKD normalization algorithm also affects many other characters, in addition to full-width and half-width characters. That may or may not be appropriate depending on why you are normalizing.
For example, you can remove the "fraktur" style "font" using NFKD
.
SELECT normalize('𝔣𝔯𝔞𝔨𝔱𝔲𝔯', NFKD);
normalize
-----------
fraktur
(1 row)
Check the Unicode specification for details on how the normalization algorithm works:
https://unicode.org/reports/tr15/#Norm_Forms
Upvotes: 1
Reputation: 2907
How about using translate() function?
-- prepare test data
CREATE TABLE address (
id integer,
name text
);
INSERT INTO address VALUES (1, 'SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1');
-- show test data
SELECT * from address;
-- convert Full-Width to Half-Width Japanese
UPDATE address SET name = translate(name,
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
);
-- see the converted data
SELECT * from address;
This code made the name column to "SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1".
Upvotes: 3