JSv4
JSv4

Reputation: 313

General Python Unicode / ASCII Casting Issue Causing Trouble in Pyorient

UPDATE: I opened an issue on github based on a Ivan Mainetti's suggestion. If you want to weigh in there, it is :https://github.com/orientechnologies/orientdb/issues/6757

I am working on a database based on OrienDB and using a python interface for it. I've had pretty good luck with it, but I've run into a problem that seems to be the driver's (pyorient) wonkiness when dealing with certain unicode characters.

The data structure I'm uploading to the database looks like this:

new_Node = {'@Nodes':
                {
                    "Abs_Address":Ono.absolute_address,
                    'Content':Ono.content,
                    'Heading':Ono.heading,
                    'Type':Ono.type,
                    'Value':Ono.value
                }
}

I have created literally hundreds of records flawlessly on OrientDB / pyorient. I don't think the problem is necessarily a pyorient specific question, however, as I think the reason it is failing on a particular record is because the Ono.absolute_address element has a unicode character that pyorient is somehow choking on.

The record I want to create has an Abs_address of /u/c/2/a1–2, but the node I get when I pass the value to the my data structure above is this:

{'@Nodes': {'Content': '', 'Abs_Address': u'/u/c/2/a1\u20132', 'Type': 'section', 'Heading': ' Transferred', 'Value': u'1\u20132'}}

I think that somehow my problem is python is mixing unicode and ascii strings / chars? I'm a bit new to python and not declaring types, so I'm hoping this isn't an issue with pyorient perse given that the new_Node object doesn't output the properly formatted string...? Or is this an instance of pyorient not liking unicode? I'm tearing my hair out on this one. Any help is appreciated.

In case the error is coming from pyorient and not some kind of text encoding, here's the pyorient-related info. I am creating the record using this code:

rec_position = self.pyo_client.record_create(14, new_Node)

And this is the error I'm getting:

com.orientechnologies.orient.core.storage.ORecordDuplicatedException - Cannot index record Nodes{Content:,Abs_Address:null,Type:section,Heading: Transferred,Value:null}: found duplicated key 'null' in index 'Nodes.Abs_Address' previously assigned to the record #14:558

The error is odd as it suggests that the backend database is getting a null object for the address. Apparently it did create an entry for this "address," but it's not what I want it to do. I don't know why address strings with unicode are coming up null in the database... I can create it through orientDB studio using the exact string I fed into the new_Node data structure... but I can't use python to do the same thing.

Someone help?

EDIT:

Thanks to Laurent, I've narrowed the problem down to something to do with unicode objects and pyorient. Whenever a the variable I am passing is type unicode, the pyorient adapter sends a null value to the OrientDB database. I determined the value that is causing the problem is an ndash symbol, and Laurent helped me replace it with a minus sign using this code

.replace(u"\u2013",u"-")

When I do that, however, pyorient gets unicode objects which it then passes as null values... This is not good. I can fix this short term by recasting the string using str(...) and this appears to solve my immediate problem:

str(Ono.absolute_address.replace(u"\u2013",u"-"))

. Problem is, I know I will have symbols and other unusual characters in my DB data. I know the database supports the unicode strings because I can add them manually or use SQL syntax to do what I cannot do via pyorient and python... I am assuming this is a dicey casting issue somewhere, but I'm not really sure where. This seems very similar to this problem: http://stackoverflow.duapp.com/questions/34757352/how-do-i-create-a-linked-record-in-orientdb-using-pyorient-library

Any pyorient people out there? Python gods? Lucky s0bs? =)

Upvotes: 0

Views: 255

Answers (1)

anber
anber

Reputation: 874

I have tried your example on Python 3 with the development branch of pyorient with the latest version of OrientDB 2.2.11. If I pass the values without escaping them, your example seems to work for me and I get the right values back.

So this test works:

def test_test1(self):
    new_Node = {'@Nodes': {'Content': '',
                           'Abs_Address': '/u/c/2/a1–2',
                           'Type': 'section',
                           'Heading': ' Transferred',
                           'Value': u'1\u20132'}
                }

    self.client.record_create(14, new_Node)

    result = self.client.query('SELECT * FROM V where Abs_Address="/u/c/2/a1–2"')
    assert result[0].Abs_Address == '/u/c/2/a1–2'

I think you may be saving the unicode value as an escaped value and that's where things get tricky.

I don't trust replacing values myself so I usually escape the unicode values I send to orientdb with the following code:

import json
def _escape(string):
    return json.dumps(string)[1:-1]

The following test would fail because the escaped value won't match the escaped value in the DB so no record will be returned:

def test_test2(self):
    new_Node = {'@Nodes': {'Content': '',
                           'Abs_Address': _escape('/u/c/2/a1–2'),
                           'Type': 'section',
                           'Heading': ' Transferred',
                           'Value': u'1\u20132'}
                }

    self.client.record_create(14, new_Node)

    result = self.client.query('SELECT * FROM V where Abs_Address="%s"' % _escape('/u/c/2/a1–2'))
    assert  result[0].Abs_Address.encode('UTF-8').decode('unicode_escape') == '/u/c/2/a1–2'

In order to fix this, you have to escape the value twice:

def test_test3(self):
    new_Node = {'@Nodes': {'Content': '',
                           'Abs_Address': _escape('/u/c/2/a1–2'),
                           'Type': 'section',
                           'Heading': ' Transferred',
                           'Value': u'1\u20132'}
                }

    self.client.record_create(14, new_Node)

    result = self.client.query('SELECT * FROM V where Abs_Address="%s"' % _escape(_escape('/u/c/2/a1–2')))
    assert  result[0].Abs_Address.encode('UTF-8').decode('unicode_escape') == '/u/c/2/a1–2'

This test will succeed because you will now be asking for the escaped value in the DB.

Upvotes: 1

Related Questions