Reputation: 401
I have
b'path\n/sync/u/0/i/bv\nurl_ids_md5\n\xd9\xdd\x80fF>(\xf4?\xcc\x86c|\xd2\xd3\xf0\nargcount\n2\nstart_time\n1605232804\narg.c\n10\narg.hl\n11\n'
Which has the following elements:
1) "path"
2) "/sync/u/0/i/bv"
3) "url_ids_md5"
4) "\xd9\xdd\x80fF>(\xf4?\xcc\x86c|\xd2\xd3\xf0"
5) "argcount"
6) "2"
7) "start_time"
8) "1605232804"
9) "arg.c"
10) "10"
11) "arg.hl"
12) "11"
I'm trying to break it up into a list containing elements split by "\n". However, several situations occur:
Let's call the string in question s.
If I naively do s.split("\n"), that, of course, is not going to work, and I run into the following error:
TypeError: a bytes-like object is required, not 'str'
So then I decode() before trying to split: s.decode().split("\n"). The problem with this is I have a md5 encoded component as a part of s, which is not compatible with the default utf-8 decoding of decode().
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd9 in position 32: invalid continuation byte
Next (instead of decode()) I tried applying str() to s. I get the following:
b'path\n/sync/u/0/i/bv\nurl_ids_md5\n\xd9\xdd\x80fF>(\xf4?\xcc\x86c|\xd2\xd3\xf0\nargcount\n2\nstart_time\n1605232804\narg.c\n10\narg.hl\n11\n'
(Note that it's still very much utf-8 encoded (I think), as evidenced by the b' in front.) Interestingly, the result of doing .split("\n") on the above is the following:
["b'path\\n/sync/u/0/i/bv\\nurl_ids_md5\\n\\xd9\\xdd\\x80fF>(\\xf4?\\xcc\\x86c|\\xd2\\xd3\\xf0\\nargcount\\n2\\nstart_time\\n1605232804\\narg.c\\n10\\narg.hl\\n11\\n'"]
Which in itself doesn't make sense to me. (But do note that it allows me to do .split() now, as opposed to my original string / first example)
How do I properly split the string?
Upvotes: 0
Views: 484