Neo
Neo

Reputation: 13

How to split the line with ASCII characters (\u9078) in python

When convert the properties to JSON it added extra backslash in ASCII character, How to avoid this, see the code below

Input File (sample.properties)

property.key.CHOOSE=\u9078\u629e

Code

import json
def convertPropertiesToJson(fileName, outputFileName, sep='=', comment_char='#'):
    props = {}
    with open(fileName, "r") as f:
        for line in f:
            l = line.strip()
            if l and not l.startswith(comment_char):
                innerProps = {}
                keyValueList = l.split(sep)
                key = keyValueList[0].strip()
                keyList = key.split('.')
                value = sep.join(keyValueList[1:]).strip()
                if keyList[1] not in props:
                    props[keyList[1]] = {}
                innerProps[keyList[2]] = value
                props[keyList[1]].update(innerProps)
    with open(outputFileName, 'w') as outfile:
        json.dump(props, outfile)

convertPropertiesToJson("sample.properties", "sample.json")

Output: (sample.json)

{"key": {"CHOOSE": "\\u9078\\u629e"}}

Expected Result:

{"key": {"CHOOSE": "\u9078\u629e"}}

Upvotes: 1

Views: 982

Answers (3)

0x01
0x01

Reputation: 508

The problem seems to be that you have saved unicode characters which are represented as escaped strings. You should decode them at some point.

Changing

l = line.strip()

to (for Python 2.x)

l = line.strip().decode('unicode-escape')

to (for Python 3.x)

l = line.strip().encode('ascii').decode('unicode-escape')

gives the desired output:

{"key": {"CHOOSE": "\u9078\u629e"}}

Upvotes: 0

VPfB
VPfB

Reputation: 17342

The problem is the input is read as-is, and \u is copied literally as two characters. The easiest fix is probably this:

with open(fileName, "r", encoding='unicode-escape') as f:

This is will decode the escaped unicode characters.

Upvotes: 2

Rahul
Rahul

Reputation: 11560

I don't know solution to your problem but I found out where problem occurs.

with open('sample.properties', encoding='utf-8') as f:
    for line in f:
        print(line)
        print(repr(line))
        d = {}
        d['line'] = line
        print(d)

out:
property.key.CHOOSE=\u9078\u629e
'property.key.CHOOSE=\\u9078\\u629e'
{'line': 'property.key.CHOOSE=\\u9078\\u629e'}

I don't know how adding to dictionary adds repr of string.

Upvotes: 0

Related Questions