Reputation: 26406
I am converting a Python 2 program to Python 3 and I'm not sure about the approach to take.
The program reads in either a single email from STDIN, or file(s) are specified containing emails. The program then parses the emails and does some processing on them.
SO we need to work with the raw data of the email input, to store it on disk and do an MD5 hash on it. We also need to work with the text of the email input in order to run it through the Python email parser and extract fields etc.
With Python 3 it is unclear to me how we should be reading in the data. I believe we need the raw binary data in order to do an md5 on it, and also to be able to write it to disk. I understand we also need it in text form to be able to parse it with the email library. Python 3 has made significant changes to the IO handling and text handling and I can't see the "correct" approach to read the email raw data and also use the same data in text form.
Can anyone offer general guidance on this?
Upvotes: 1
Views: 287
Reputation: 176950
The general guidance is convert everything to unicode ASAP and keep it that way until the last possible minute.
Remember that str
is the old unicode
and bytes
is the old str
.
See http://docs.python.org/dev/howto/unicode.html for a start.
With Python 3 it is unclear to me how we should be reading in the data.
Specify the encoding
when you open the file it and it will automatically give you unicode. If you're reading from stdin
, you'll get unicode. You can read from stdin.buffer
to get binary data.
I believe we need the raw binary data in order to do an md5 on it
Yes, you do. encode
it when you need to hash it.
and also to be able to write it to disk.
You specify the encoding
when you open the file you're writing it to, and the file object encodes it for you.
I understand we also need it in text form to be able to parse it with the email library.
Yep, but since it'll get decoded when you open the file, that's what you'll have.
That said, this question is really too open ended for Stack Overflow. When you have a specific problem / question, come back and we'll help.
Upvotes: 2