ajfbiw.s
ajfbiw.s

Reputation: 401

Python3: split encoded string

I have

b'path\n/sync/u/0/i/bv\nurl_ids_md5\n\xd9\xdd\x80fF>(\xf4?\xcc\x86c|\xd2\xd3\xf0\nargcount\n2\nstart_time\n1605232804\narg.c\n10\narg.hl\n11\n'

Which has the following elements:

 1) "path"
 2) "/sync/u/0/i/bv"
 3) "url_ids_md5"
 4) "\xd9\xdd\x80fF>(\xf4?\xcc\x86c|\xd2\xd3\xf0"
 5) "argcount"
 6) "2"
 7) "start_time"
 8) "1605232804"
 9) "arg.c"
10) "10"
11) "arg.hl"
12) "11"

I'm trying to break it up into a list containing elements split by "\n". However, several situations occur:

Let's call the string in question s.

If I naively do s.split("\n"), that, of course, is not going to work, and I run into the following error:

TypeError: a bytes-like object is required, not 'str'

So then I decode() before trying to split: s.decode().split("\n"). The problem with this is I have a md5 encoded component as a part of s, which is not compatible with the default utf-8 decoding of decode().

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd9 in position 32: invalid continuation byte

Next (instead of decode()) I tried applying str() to s. I get the following:

b'path\n/sync/u/0/i/bv\nurl_ids_md5\n\xd9\xdd\x80fF>(\xf4?\xcc\x86c|\xd2\xd3\xf0\nargcount\n2\nstart_time\n1605232804\narg.c\n10\narg.hl\n11\n'

(Note that it's still very much utf-8 encoded (I think), as evidenced by the b' in front.) Interestingly, the result of doing .split("\n") on the above is the following:

["b'path\\n/sync/u/0/i/bv\\nurl_ids_md5\\n\\xd9\\xdd\\x80fF>(\\xf4?\\xcc\\x86c|\\xd2\\xd3\\xf0\\nargcount\\n2\\nstart_time\\n1605232804\\narg.c\\n10\\narg.hl\\n11\\n'"]

Which in itself doesn't make sense to me. (But do note that it allows me to do .split() now, as opposed to my original string / first example)

How do I properly split the string?

Upvotes: 0

Views: 484

Answers (1)

Lior
Lior

Reputation: 315

Try doing the following:

s.split("\n".encode())

Good Luck!

Upvotes: 1

Related Questions