Reputation: 3233
I am trying to compress a large string on a client program in C# (.net 4) and send it to a server (django, python 2.7) using a PUT request. Ideally I want to use the standard library at both ends, so I am trying to use gzip.
My C# code is:
public static string Compress(string s) {
var bytes = Encoding.Unicode.GetBytes(s);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
msi.CopyTo(gs);
}
return Convert.ToBase64String(mso.ToArray());
}
}
The python code is:
s = base64.standard_b64decode(request)
buff = cStringIO.StringIO(s)
with gzip.GzipFile(fileobj=buff) as gz:
decompressed_data = gz.read()
It's almost working, but the output is: {▯"▯c▯h▯a▯n▯g▯e▯d▯"▯} when it should be {"changed"}, i.e. every other letter is something weird. If I take out every other character by doing decompressed_data[::2], then it works, but it's a bit of a hack, and clearly there is something else wrong.
I'm wondering if I need to base64 encode it at all for a PUT request? Is this only necessary for POST?
Upvotes: 4
Views: 2059
Reputation: 29794
I think the main problem might be C# uses UTF-16
encoded strings. This may yield a problem similar to yours. As any other encoding problem, we might need a little luck here but I guess you can solve this by doing:
decompressed_data = gz.read().decode('utf-16')
There, decompressed_data should be Unicode
and you can treat it as such for further work.
UPDATE: This worked for me:
static void Main(string[] args)
{
FileStream f = new FileStream("test", FileMode.CreateNew);
using (StreamWriter w = new StreamWriter(f))
{
w.Write(Compress("hello"));
}
}
public static string Compress(string s)
{
var bytes = Encoding.Unicode.GetBytes(s);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gs = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(gs);
}
return Convert.ToBase64String(mso.ToArray());
}
}
import base64
import cStringIO
import gzip
f = open('test','rb')
s = base64.standard_b64decode(f.read())
buff = cStringIO.StringIO(s)
with gzip.GzipFile(fileobj=buff) as gz:
decompressed_data = gz.read()
print decompressed_data.decode('utf-16')
Without decode('utf-16)
it printed in the console:
>>>h e l l o
with it it did well:
>>>hello
Good luck, hope this helps!
Upvotes: 4
Reputation: 1500375
It's almost working, but the output is: {▯"▯c▯h▯a▯n▯g▯e▯d▯"▯} when it should be {"changed"}
That's because you're using Encoding.Unicode
to convert the string to bytes to start with.
If you can tell Python which encoding to use, you could do that - otherwise you need to use an encoding on the C# side which matches what Python expects.
If you can specify it on both sides, I'd suggest using UTF-8 rather than UTF-16. Even though you're compressing, it wouldn't hurt to make the data half the size (in many cases) to start with :)
I'm also somewhat suspicious of this line:
buff = cStringIO.StringIO(s)
s
really isn't text data - it's compressed binary data, and should be treated as such. It may be okay - it's just worth checking whether there's a better way.
Upvotes: 2