Python3: split encoded string

Question

I have

b'path
/sync/u/0/i/bv
url_ids_md5
\xd9\xdd\x80fF>(\xf4?\xcc\x86c|\xd2\xd3\xf0
argcount
2
start_time
1605232804
arg.c
10
arg.hl
11
'

Which has the following elements:

 1) "path"
 2) "/sync/u/0/i/bv"
 3) "url_ids_md5"
 4) "\xd9\xdd\x80fF>(\xf4?\xcc\x86c|\xd2\xd3\xf0"
 5) "argcount"
 6) "2"
 7) "start_time"
 8) "1605232804"
 9) "arg.c"
10) "10"
11) "arg.hl"
12) "11"

I'm trying to break it up into a list containing elements split by " ". However, several situations occur:

Let's call the string in question s.

If I naively do s.split(" "), that, of course, is not going to work, and I run into the following error:

TypeError: a bytes-like object is required, not 'str'

So then I decode() before trying to split: s.decode().split(" "). The problem with this is I have a md5 encoded component as a part of s, which is not compatible with the default utf-8 decoding of decode().

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd9 in position 32: invalid continuation byte

Next (instead of decode()) I tried applying str() to s. I get the following:

b'path
/sync/u/0/i/bv
url_ids_md5
\xd9\xdd\x80fF>(\xf4?\xcc\x86c|\xd2\xd3\xf0
argcount
2
start_time
1605232804
arg.c
10
arg.hl
11
'

(Note that it's still very much utf-8 encoded (I think), as evidenced by the b' in front.) Interestingly, the result of doing .split(" ") on the above is the following:

["b'path\n/sync/u/0/i/bv\nurl_ids_md5\n\xd9\xdd\x80fF>(\xf4?\xcc\x86c|\xd2\xd3\xf0\nargcount\n2\nstart_time\n1605232804\narg.c\n10\narg.hl\n11\n'"]

Which in itself doesn't make sense to me. (But do note that it allows me to do .split() now, as opposed to my original string / first example)

How do I properly split the string?

Lior · Accepted Answer

Try doing the following:

s.split("
".encode())

Good Luck!

Python3: split encoded string

Answers (1)

Related Questions