icn
icn

Reputation: 17876

Django dumpdata UTF-8 (Unicode)

Is there a easy way to dump UTF-8 data from a database?

I know this command:

manage.py dumpdata > mydata.json

But the data I got in the file mydata.json, Unicode data looks like:

"name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8"

I would like to see a real Unicode string like 全球卫星定位系统 (Chinese).

Upvotes: 32

Views: 24560

Answers (14)

Jon Knebel
Jon Knebel

Reputation: 1

In 2023, I still had a rough time with this. I had to follow @wertartem's suggestion and then Change the file encoding of the outputted file to get it to work. It seems the "-Xutf8" tag wasn't necessary for me, but someone reading this might need to follow all 3 steps.

I also had a smaller issue I solved by excluding the admin.logentry from the export (added these tags "-e auth -e contenttypes -e auth.Permission -e admin.logentry")

My full process:

  1. For proper encoding, at least for Windows, make sure utf-8 for worldwide language support is enabled. To do this, (at least for Windows 11) go to "Time & Language" > "Language & Region". Under "Related Settings", click "Administrative Language Settings". Click "Change System Locale". Check the box for "Beta: Use Unicode UTF-8 for worldwide language support". Restart the computer. Once enabled, skip this step for future exports.
  2. Run this command in terminal (here, I'm exporting to a subdirectory and excluding several apps and models from the export): python -Xutf8 manage.py dumpdata --format=json --natural-foreign --natural-primary -e auth -e contenttypes -e auth.Permission -e admin.logentry > databases/seeds/dump.json
  3. Open this "dump.json" file and run the vscode command "Change File Encoding" to save with UTF-8 encoding. If vscode crashes, this can be done in sublime text instead by opening the file and saving with encoding from the file menu.
  4. Change connection to the new database.
  5. python manage.py reset_db
  6. python manage.py migrate
  7. python manage.py loaddata "databases/seeds/dump.json"

Your step 2 command may desire (but not require) slight modification. Check out this: https://docs.djangoproject.com/en/4.2/ref/django-admin/#dumpdata

Upvotes: 0

Zack Plauché
Zack Plauché

Reputation: 4210

This solution worked for me from @Julian Polard's post.

Basically just add -Xutf8 in front of py or python when running this command:

python -Xutf8 manage.py dumpdata > data.json

Please upvote his answer as well if this worked for you ^_^

Upvotes: 20

wolfpan
wolfpan

Reputation: 11

here's a new solution.

I just shared a repo on github: django-dump-load-utf8.

However, I think this is a bug of django, and hope someone can merge my project to django.

A not bad solution, but I think fix the bug in django would be better.

manage.py dumpdatautf8 --output data.json
manage.py loaddatautf8 data.json

Upvotes: 1

Wertartem
Wertartem

Reputation: 237

Here is the solution from djangoproject.com
You go to Settings there's a "Use Unicode UTF-8 for worldwide language support", box in "Language" - "Administrative Language Settings" - "Change system locale" - "Region Settings". If we apply that, and reboot, then we get a sensible, modern, default encoding from Python. djangoproject.com

Upvotes: 0

highpost
highpost

Reputation: 1323

This problem has been fixed for both JSON and YAML in Django 3.1.

Upvotes: 1

Cédryc Ruth
Cédryc Ruth

Reputation: 1

I encountered the same issue. After reading all the answers, I came up with a mix of Ali and darthwade's answers:

manage.py dumpdata app.category --indent=2 > categories.json
manage.py shell

import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, "rb").read().decode('unicode-escape')
codecs.open(dst, "wb","utf-8").write(source)

In Python 3, I had to open the file in binary mode and decode as unicode-escape. Also I added utf-8 when I open in write (binary) mode.

I hope it helps :)

Upvotes: 0

Denis Eliseev
Denis Eliseev

Reputation: 529

I'm usually add next strings in my Makefile:

.PONY: dump

# make APP=core MODEL=Schema dump
dump:
    @python manage.py dumpdata --indent=2 --natural-foreign --natural-primary ${APP}.${MODEL} | \
    python -c "import sys; sys.stdout.write(sys.stdin.read().encode().decode('unicode_escape'))" \
    > ${APP}/fixtures/${MODEL}.json

It's ok for standard django project structure, fix if your project structure is different.

Upvotes: 1

Ali Shamakhi
Ali Shamakhi

Reputation: 83

As YOU has provided a good answer that is accepted, it should be considered that python 3 distincts text and binary data, so both files must be opened in binary mode:

open("mydata-new.json","wb").write(open("mydata.json", "rb").read().decode("unicode_escape").encode("utf8"))

Otherwise, the error AttributeError: 'str' object has no attribute 'decode' will be raised.

Upvotes: 2

Victor Akimov
Victor Akimov

Reputation: 529

You can create your own serializer which passes ensure_ascii=False argument to json.dumps function:

# serfializers/json_no_uescape.py
from django.core.serializers.json import *


class Serializer(Serializer):

    def _init_options(self):
        super(Serializer, self)._init_options()
        self.json_kwargs['ensure_ascii'] = False

Then register new serializer (for example in your app __init__.py file):

from django.core.serializers import register_serializer

register_serializer('json-no-uescape', 'serializers.json_no_uescape')

Then you can run:

manage.py dumpdata --format=json-no-uescape > output.json

Upvotes: 3

darthwade
darthwade

Reputation: 1434

import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, 'r').read().decode('string-escape')
codecs.open(dst, "wb").write(source)

Upvotes: 0

Tisho
Tisho

Reputation: 8482

After struggling with similar issues, I've just found, that xml formatter handles UTF8 properly.

manage.py dumpdata --format=xml > output.xml

I had to transfer data from Django 0.96 to Django 1.3. After numerous tries with dump/load data, I've finally succeeded using xml. No side effects for now.

Hope this will help someone, as I've landed at this thread when looking for a solution..

Upvotes: 18

dir01
dir01

Reputation: 2230

Here I wrote a snippet for that. Works for me!

Upvotes: 5

YOU
YOU

Reputation: 123831

django-admin.py dumpdata yourapp could dump for that purpose.

Or if you use MySQL, you could use the mysqldump command to dump the whole database.

And this thread has many ways to dump data, including manual methods.

UPDATE: because OP edited the question.

To convert from JSON encoding string to human readable string you could use this:

open("mydata-new.json","wb").write(open("mydata.json").read().decode("unicode_escape").encode("utf8"))

Upvotes: 12

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798676

You need to either find the call to json.dump*() in the Django code and pass the additional option ensure_ascii=False and then encode the result after, or you need to use json.load*() to load the JSON and then dump it with that option.

Upvotes: 6

Related Questions