Reputation: 1068
I am trying very basic character set conversion like iconv does but not able to figure out why its not working. I am using python decode, encode routines but looks like missing on something very basic.
Code:
#!/usr/bin/python
import sys
if __name__ == "__main__":
if len(sys.argv) < 2:
print ("wrong input")
sys.exit(1)
fi = open(sys.argv[1], "r")
buf = fi.read()
fi.close()
print ("got input: \n{0}".format(buf))
buf.decode("big5", "strict").encode("utf8", "strict")
fo = open(sys.argv[2], "w")
fo.write(buf)
fo.close()
print ("changed: \n{0}".format(buf))
Input files. hello.big5 is obtained by converting utf file with iconv
[workspace] > cat hello.utf8
hello = 你好
[workspace] > cat hello.big5
hello = �A�n
When executed:
[workspace] > ./test.py hello.big5 out
got input:
hello = �A�n
changed:
hello = �A�n
Can someone point out where I am tripping ?
Upvotes: 1
Views: 4768
Reputation: 29794
This line is not modiying buf
as you appear to be thinking:
buf.decode("big5", "strict").encode("utf8", "strict")
You can see in the docs for encode
and decode
. Those methods return strings or unicode objects, they don't modify the calling object. If you want to modify buf
just assign it the result:
buf = buf.decode("big5", "strict").encode("utf8", "strict")
Also if you're on Python2 it doesn't make sense to use parenthesis with print
, can be confusing.
Upvotes: 1