Pil Kwon
Pil Kwon

Reputation: 173

Postgresql convert Japanese Full-Width to Half-Width

I am manipulating Japanese data and in some Japanese words, there are English words and Numbers are in.

SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1 are the examples.

I wanted to convert these English and Numbers in Full-width to half-width by throwing a function or any possible ways.

the output of the input above should be look-like SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1

If anyone knows the best way to start, I would appreciate it.

Upvotes: 0

Views: 1963

Answers (2)

Curtis Fenner
Curtis Fenner

Reputation: 1411

Postgres 13 (but not 12) has a normalize() text function

https://www.postgresql.org/docs/13/functions-string.html#:~:text=normalize%20(%20text%20%5B%2C%20form%20%5D%20)%20%E2%86%92%20text

normalize ( text [, form ] ) → text

Converts the string to the specified Unicode normalization form. The optional form key word specifies the form: NFC (the default), NFD, NFKC, or NFKD. This function can only be used when the server encoding is UTF8.

normalize(U&'\0061\0308bc', NFC) → U&'\00E4bc'

Your example results in

SELECT normalize('SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1', NFKD);

               normalize               
---------------------------------------
 SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1

Note that characters such as the ,, fullwidth space, -, and & were also transformed.

The NFKD normalization algorithm also affects many other characters, in addition to full-width and half-width characters. That may or may not be appropriate depending on why you are normalizing.

For example, you can remove the "fraktur" style "font" using NFKD.

SELECT normalize('𝔣𝔯𝔞𝔨𝔱𝔲𝔯', NFKD);
 normalize 
-----------
 fraktur
(1 row)

Check the Unicode specification for details on how the normalization algorithm works:

https://unicode.org/reports/tr15/#Norm_Forms

Upvotes: 1

akky
akky

Reputation: 2907

How about using translate() function?

-- prepare test data
CREATE TABLE address (
    id integer,
    name text
);
INSERT INTO address VALUES (1, 'SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1');

-- show test data
SELECT * from address;

-- convert Full-Width to Half-Width Japanese
UPDATE address SET name = translate(name,
    '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
    '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
);

-- see the converted data
SELECT * from address;

This code made the name column to "SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1".

Upvotes: 3

Related Questions