Reputation: 48446
I have a python 2.7 dict such as {u"eat": u"糖果", u"drink": u"café"}
, and I need to transfer it using JSON. The JSON string must be regular ASCII and it must be less than 256 chars.
So far, I have coded this:
import json
def payload_to_json(payload, max_size = 256):
while True:
json_string = json.dumps(payload, separators = (',', ':'))
if len(json_string) <= max_size:
return json_string
max_length, found_key = 0, None
for key, value in payload.iteritems():
length = len(value)
if length > max_length:
max_length = length
found_key = key
if max_length == 0:
return "" # just in case max_size is really low
payload[found_key] = payload[found_key][:-1] # remove one char
It works as expected:
>>> payload = {u"eat": u"糖果", u"drink": u"café"}
>>> print payload_to_json(payload)
{"drink":"caf\u00e9","eat":"\u7cd6\u679c"}
>>> print payload_to_json(payload, max_size=41)
{"drink":"caf","eat":"\u7cd6\u679c"}
>>> print payload_to_json(payload, max_size=35)
{"drink":"ca","eat":"\u7cd6\u679c"}
>>> print payload_to_json(payload, max_size=34)
{"drink":"c","eat":"\u7cd6\u679c"}
>>> print payload_to_json(payload, max_size=30)
{"drink":"c","eat":"\u7cd6"}
>>> print payload_to_json(payload, max_size=21)
{"drink":"","eat":""}
>>> print payload_to_json(payload, max_size=20)
It seems to me that there should be a way to optimize this! I'm really stripping one character at a time, it feels so wrong.
My question is very close to this one, except I use python 2.7, and the json encoder produces pretty long JSON strings whenever the source strings contain non-ASCII unicode chars.
Plus I'm pretty sure this will break with UTF-16 surrogate pairs...
Upvotes: 2
Views: 3935
Reputation: 2113
why don't you use the strategy in the post you linked: you measure the first generated json, then you strip from the values the right amount of chars in the preferred order.
otherwise you could guess the number of chars json uses by counting: for each mapped variable these chars "":"",
plus the overall {}
, minus a comma. (unless you don't have a more complicated nested list, obviously)
the unicode functionality shouldn't be a problem as long as you use u''
notation (not sure, but shouldn't be difficult to check)
Upvotes: 0
Reputation: 365697
If you're trying to make this faster (which you shouldn't be, unless you know this is a hotspot in your program with a real performance cost), you can first guess the number of characters to strip, and then deal with leftovers.
First, if you need to strip 52 characters, and there are 10 keys, you need to strip 6 chars each from 2 keys, and 5 each from the other 8, right? Except, of course, that you may be trying to strip 6 chars from something that's only 4 chars long, which means you'll end up still 2 chars over the limit. But you can keep track of those leftovers and deal with them after you're done. It's unlikely that there will be enough leftovers to make another pass through the "fast" version worth doing, so you might as well just use the "slow" version.
def payload_to_json(payload, max_size = 256):
json_string = json.dumps(payload, separators = (',', ':'))
chars_to_strip = len(json_string) - max_size
if chars_to_strip <= 0:
return json_string
key_count = len(payload)
chars_per_key, extras = divmod(chars_to_strip, key_count)
leftover = 0
for i, key in enumerate(payload):
to_strip = chars_per_key + (i < extras)
orig_len = len(payload[key])
if orig_len < to_strip:
payload[key] = ''
leftover += to_strip - orig_len
else:
payload[key] = payload[key][:-to_strip]
if leftover:
return slow_payload_to_json(payload, max_size)
else:
return json.dumps(payload, separators = (',', ':'))
I'm not sure this actually will speed things up in your use cases. For very small objects and max sizes, I wouldn't be surprised if it actually slows things down. But for huge objects way over the max size, it would probably help a lot.
Upvotes: 1
Reputation: 77454
How about computing the serialized size of each entry.
Then choose as many elements such that you have the desired length?
Either way, this sounds like a really bad idea overall.
Upvotes: 0