Rajesh Upadhayaya
Rajesh Upadhayaya

Reputation: 81

How to convert double byte character/string to single byte and vice versa?

I am working on the Japanese text and i have 2 requirement.

  1. Convert all charters in a string into double byte characters. This string can contain single or double or both character but the resulting string should be double byte only.

eg: 東京都中央区晴海1丁目8番11号

expected output: 東京都中央区晴海<1>丁目<8>番<11>号. all <> should contain double byte characters

  1. Convert all charters into single byte characters. String is similar to requirement 1 but resulting string should contain only single byte characters.

eg: ADORES,Inc.

expected output: ADORES, INC.

I am reading this data from a csv file which contain nearly 300 columns and only 3 columns need these operations and rest should remain same.

I've got below code from online but it throws error. raw_comp_name contain the data from csv. raw_comp_name.encode(encoding='utf-8').decode('ascii')

Upvotes: 2

Views: 2763

Answers (1)

Jaha
Jaha

Reputation: 159

Information


Japanese characters has a standards below. Double-Byte Characters are twice as wide as normal alphabetic characters.

  • Double-Byte Character (Zenkaku, 全角)
  • Single-Byte Character (Hankaku, 半角) enter image description here

You can get more details from this link.

Answer


You can use this jaconv | pip module. It has a both Single-Byte to Double-Byte and Double-Byte to Single-Byte functions. See more details from module documentation link

Attached example code below:

import jaconv

hankaku_text = '東京都中央区晴海1丁目8番11号'
converted_zenkaku = jaconv.hankaku2zenkaku(hankaku_text)
print(converted_zenkaku)

zenkaku_text = "ADORES,Inc."
converted_hankaku = jaconv.zenkaku2hankaku(zenkaku_text)
print(converted_hankaku)

output:
東京都中央区晴海1丁目8番11号
ADORES, Inc.

Upvotes: 1

Related Questions