Reputation: 17876
Is there a easy way to dump UTF-8 data from a database?
I know this command:
manage.py dumpdata > mydata.json
But the data I got in the file mydata.json, Unicode data looks like:
"name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8"
I would like to see a real Unicode string like 全球卫星定位系统
(Chinese).
Upvotes: 32
Views: 24560
Reputation: 1
In 2023, I still had a rough time with this. I had to follow @wertartem's suggestion and then Change the file encoding of the outputted file to get it to work. It seems the "-Xutf8" tag wasn't necessary for me, but someone reading this might need to follow all 3 steps.
I also had a smaller issue I solved by excluding the admin.logentry from the export (added these tags "-e auth -e contenttypes -e auth.Permission -e admin.logentry")
My full process:
Your step 2 command may desire (but not require) slight modification. Check out this: https://docs.djangoproject.com/en/4.2/ref/django-admin/#dumpdata
Upvotes: 0
Reputation: 4210
This solution worked for me from @Julian Polard's post.
Basically just add -Xutf8
in front of py
or python
when running this command:
python -Xutf8 manage.py dumpdata > data.json
Please upvote his answer as well if this worked for you ^_^
Upvotes: 20
Reputation: 11
here's a new solution.
I just shared a repo on github: django-dump-load-utf8.
However, I think this is a bug of django, and hope someone can merge my project to django.
A not bad solution, but I think fix the bug in django would be better.
manage.py dumpdatautf8 --output data.json
manage.py loaddatautf8 data.json
Upvotes: 1
Reputation: 237
Here is the solution from djangoproject.com
You go to Settings there's a "Use Unicode UTF-8 for worldwide language support", box in "Language" - "Administrative Language Settings" - "Change system locale" - "Region Settings". If we apply that, and reboot, then we get a sensible, modern, default encoding from Python.
djangoproject.com
Upvotes: 0
Reputation: 1323
This problem has been fixed for both JSON and YAML in Django 3.1.
Upvotes: 1
Reputation: 1
I encountered the same issue. After reading all the answers, I came up with a mix of Ali and darthwade's answers:
manage.py dumpdata app.category --indent=2 > categories.json
manage.py shell
import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, "rb").read().decode('unicode-escape')
codecs.open(dst, "wb","utf-8").write(source)
In Python 3, I had to open the file in binary mode and decode as unicode-escape. Also I added utf-8 when I open in write (binary) mode.
I hope it helps :)
Upvotes: 0
Reputation: 529
I'm usually add next strings in my Makefile:
.PONY: dump
# make APP=core MODEL=Schema dump
dump:
@python manage.py dumpdata --indent=2 --natural-foreign --natural-primary ${APP}.${MODEL} | \
python -c "import sys; sys.stdout.write(sys.stdin.read().encode().decode('unicode_escape'))" \
> ${APP}/fixtures/${MODEL}.json
It's ok for standard django project structure, fix if your project structure is different.
Upvotes: 1
Reputation: 83
As YOU has provided a good answer that is accepted, it should be considered that python 3 distincts text and binary data, so both files must be opened in binary mode:
open("mydata-new.json","wb").write(open("mydata.json", "rb").read().decode("unicode_escape").encode("utf8"))
Otherwise, the error AttributeError: 'str' object has no attribute 'decode'
will be raised.
Upvotes: 2
Reputation: 529
You can create your own serializer which passes ensure_ascii=False
argument to json.dumps
function:
# serfializers/json_no_uescape.py
from django.core.serializers.json import *
class Serializer(Serializer):
def _init_options(self):
super(Serializer, self)._init_options()
self.json_kwargs['ensure_ascii'] = False
Then register new serializer (for example in your app __init__.py
file):
from django.core.serializers import register_serializer
register_serializer('json-no-uescape', 'serializers.json_no_uescape')
Then you can run:
manage.py dumpdata --format=json-no-uescape > output.json
Upvotes: 3
Reputation: 1434
import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, 'r').read().decode('string-escape')
codecs.open(dst, "wb").write(source)
Upvotes: 0
Reputation: 8482
After struggling with similar issues, I've just found, that xml formatter handles UTF8 properly.
manage.py dumpdata --format=xml > output.xml
I had to transfer data from Django 0.96 to Django 1.3. After numerous tries with dump/load data, I've finally succeeded using xml. No side effects for now.
Hope this will help someone, as I've landed at this thread when looking for a solution..
Upvotes: 18
Reputation: 123831
django-admin.py dumpdata yourapp could dump for that purpose.
Or if you use MySQL, you could use the mysqldump command to dump the whole database.
And this thread has many ways to dump data, including manual methods.
UPDATE: because OP edited the question.
To convert from JSON encoding string to human readable string you could use this:
open("mydata-new.json","wb").write(open("mydata.json").read().decode("unicode_escape").encode("utf8"))
Upvotes: 12
Reputation: 798676
You need to either find the call to json.dump*()
in the Django code and pass the additional option ensure_ascii=False
and then encode the result after, or you need to use json.load*()
to load the JSON and then dump it with that option.
Upvotes: 6