Arbitrary
Arbitrary

Reputation: 168

python3: how to convert "\u3000" (ideographic space) to " "?

japanese.txt

あかさ
あいうえ お
いい

mycode.py

with open('japanese.txt', 'r', encoding='utf-8') as f:
    old = [line.strip() for line in f.readlines()]

send_mail(from, to, title, message=f"hello {old}!")

Then I receive mail like this

hello ['あかさ', 'あいうえ\u3000お', 'いい']!

What I want to mail is this

hello ['あかさ', 'あいうえ お', 'いい']!

How can I achieve this?

Upvotes: 3

Views: 7545

Answers (2)

jdaz
jdaz

Reputation: 6053

If you want to replace the \u3000 character with a standard space and do the same type of thing for other less common unicode characters, you can use the unicodedata module:

import unicodedata

jText = """あかさ
あいうえ お
いい"""

jList = [line.strip() for line in jText.split("\n")] 
    # ['あかさ', 'あいうえ\u3000お', 'いい']

normalizedList = [unicodedata.normalize('NFKC', line) for line in jList]
    # ['あかさ', 'あいうえ お', 'いい']

Upvotes: 3

awesoon
awesoon

Reputation: 33691

list's __str__ method uses repr on elements, therefore you're seeing \u3000 in your mail. Just convert list to string yourself:

In [28]: l = ['あかさ', 'あいうえ\u3000お', 'いい']

In [29]: print(', '.join(map(str, l)))
あかさ, あいうえ お, いい

If you're sure all your list elements are strings, you can omit map(str, ...) and just use ', '.join(l)

Upvotes: 3

Related Questions