How to convert double byte character/string to single byte and vice versa?

Question

I am working on the Japanese text and i have 2 requirement.

Convert all charters in a string into double byte characters. This string can contain single or double or both character but the resulting string should be double byte only.

eg: 東京都中央区晴海1丁目8番11号

expected output: 東京都中央区晴海<1>丁目<8>番<11>号. all <> should contain double byte characters

Convert all charters into single byte characters. String is similar to requirement 1 but resulting string should contain only single byte characters.

eg: ＡＤＯＲＥＳ，Ｉｎｃ．

expected output: ADORES, INC.

I am reading this data from a csv file which contain nearly 300 columns and only 3 columns need these operations and rest should remain same.

I've got below code from online but it throws error. raw_comp_name contain the data from csv. raw_comp_name.encode(encoding='utf-8').decode('ascii')

Jaha · Accepted Answer

Information

Japanese characters has a standards below. Double-Byte Characters are twice as wide as normal alphabetic characters.

Double-Byte Character (Zenkaku, 全角)
Single-Byte Character (Hankaku, 半角)

You can get more details from this link.

Answer

You can use this jaconv | pip module. It has a both Single-Byte to Double-Byte and Double-Byte to Single-Byte functions. See more details from module documentation link

Attached example code below:

import jaconv

hankaku_text = '東京都中央区晴海1丁目8番11号'
converted_zenkaku = jaconv.hankaku2zenkaku(hankaku_text)
print(converted_zenkaku)

zenkaku_text = "ＡＤＯＲＥＳ，Ｉｎｃ．"
converted_hankaku = jaconv.zenkaku2hankaku(zenkaku_text)
print(converted_hankaku)

output:
東京都中央区晴海1丁目8番11号
ADORES, Inc.

How to convert double byte character/string to single byte and vice versa?

Answers (1)

Information

Answer

Related Questions