Linh Nguyen
Linh Nguyen

Reputation: 3890

Postgresql Json column not saving utf-8 character

Hi i'm trying to save data that i get from this api into my Json column in my postgresql using sqlalchemy and python requests.

r = requests.get(api)

content = r.content

data = json.loads(content)

crawl_item = {}
crawl_item = session.query(CrawlItem).filter_by(site_id=3, href=list_id).first()
crawl_item.description = data['ad']['body']
crawl_item.meta_data = {}
crawl_item.meta_data["ward"] = data['ad_params']['ward']['value']

try:
      session.commit()
except:
      session.rollback()
      raise
finally:
      ret_id = crawl_item.id
      session.close()

my model:

class CrawlItem(Base):
    ...
    description = Column(Text)
    meta_data = Column(postgresql.JSON)

i want to get the value of ward :

"ward": {
      "id": "ward",
      "value": "Thị trấn Trạm Trôi",
      "label": " Phường, thị xã, thị trấn"
    }

I already encoding my postgresql to utf-8 so other fields that are not json column (description = Column(Text)) save utf-8 characters normally only my json column data are not decode:

enter image description here

{ 
   "ward":"Th\u1ecb tr\u1ea5n Tr\u1ea1m Tr\u00f4i"
}

description column:

description

meta_data column:

meta_data

i had tried using :

crawl_item.meta_data["ward"] = data['ad_params']['ward']['value'].decode('utf-8')

but the ward data don't get save

I have no idea what is wrong, hope someone can help me

EDIT:

i checked the data with psql and got these:

description column:

enter image description here

meta_data column:

enter image description here

It seems like only meta_data json column have trouble with the characters

Upvotes: 3

Views: 4264

Answers (2)

zoheir maz
zoheir maz

Reputation: 147

use "jsonb" data type for your json column or cast "meta_data" field to "jsonb" like this:

select meta_data::jsonb from your_table;

Upvotes: -2

Rustam Yunusov
Rustam Yunusov

Reputation: 180

Sqlalchemy serializes JSON field before save to db (see url and url and url).

json_serializer = dialect._json_serializer or json.dumps

By default, the PostgreSQL' dialect uses json.dumps and json.loads.


When you work with Text column, the data is converted in the following flow:

str -> bytes in utf-8 encoding

When you work with JSON column for PostgreSQL dialect, the data is converted in the following flow:

dict -> str with escaped non-ascii symbols -> bytes in utf-8 encoding

You can override the serializer in your engine configuration using json_serializer field:

json_serializer=partial(json.dumps, ensure_ascii=False)

Upvotes: 4

Related Questions