delaflota
delaflota

Reputation: 179

Which encoding format for reading superscript (NG²) and latin-1 in python?

I use to read latin playlist from spotify with a python(2.7) script.
Up to now latin-1 worked fine.

But then I encountered a name like NG², which made it not working any
more.

This is the error message:

...
Solo Fue Una Noche;NG²;Comienzos;9;2004 (printed by a print() cmd)
Traceback (most recent call last):
  File "get_playlist-tracks.py", line 110, in <module>
    ndt.write(line+"\n").encode('latin-1')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-22: ordinal not in range(128)

I think, I'd need a combination code for latin-1 and superscript.
Is that correct and does any one know which would be the right one?


Thanks for the many answers!
Well, it's a bit more complicated:

(Meanwhile) I've got 3 Win10 (64bit) installations (WinA, WinB, WinC).
On WinA (the oldest one, 2011, migrated from Win7), everything works fine (Python3.4)
On winB and WinC(newest HW, Python3.6) the curl cmd gets an exit code 1,
nobody knows why the hell.
Since I want to get rid of the old WinA and continue using my python scripts I just try the scripts on
a Fedora20 guest in VMWare Player12.
Now this problem with the superscript is only on the Fedora system
(not on WinA - WinB and WinC don't make any sense to me any more due to the curl issue).

I use the following first 2 lines in the script:

#!/usr/bin/python3.3
# -*- coding: utf-8 -*-

The error appears only when I try to write this line (with the superscript 2) to a file:

print (line)         # (works fine!)
ndt.write(line+"\n") # (this one not!)

I tried the write the command also with .decode('utf-8') and .decode('latin-1')
but always I'm getting the same message...

Then I tried following in the python console:

>>> line="Solo Fue Una Noche;NG²;Comienzos;9;2004"
>>> playlist_name = '/home/.../Python/PLLs/Sole_01a_tracks.txt'
>>> ndt = open(playlist_name, 'w')
>>> ndt.write(line+"\n").decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'decode'
>>> ndt.write(line+"\n")
40
>>> line
'Solo Fue Una Noche;NG²;Comienzos;9;2004'
>>> playlist_name
'/home/.../Python/PLLs/Sole_01a_tracks.txt'
>>> ndt.write(line)
39

The code

print ("playlist_contents_file:", playlist_name)

prints out:

('playlist_contents_file:', u'/home/.../Python/PLLs/Sole_01a_tracks.txt')

Upvotes: 0

Views: 1098

Answers (1)

BoarGules
BoarGules

Reputation: 16941

The problem is not the superscript 2. It is Latin-1 character \xb2 and you don't need a different encoding. The problem is your call to encode() on a string of bytes that is already Latin-1.

First, understand that encode() takes a Unicode string and turns it into some representation that maps byte values to Unicode codepoints. So to use it you have to call it on a Unicode string. If you call encode() on a normal string, Python tries to coerce it to Unicode first.

Because this is Python 2, your original string (line) is a string of bytes that can't be reliably coerced to Unicode unless you tell it what the encoding is. If you don't, and opt for default coercion, Python assumes ascii.

So you have to decode it out of latin-1 to get Unicode:

>>> line="Solo Fue Una Noche;NG²;Comienzos;9;2004"
>>> line
'Solo Fue Una Noche;NG\xb2;Comienzos;9;2004'
>>> line.decode('latin-1')
u'Solo Fue Una Noche;NG\xb2;Comienzos;9;2004'

In this case, Unicode and Latin-1 happen to have the same 8-bit representation of your non-ascii character. But that is just a convenient accident. It could be different, which is why you have to specify the decoding. You now have a unicode string, to which you can append '\n':

>>> line.decode('latin-1')+"\n"
u'Solo Fue Una Noche;NG\xb2;Comienzos;9;2004\n'

Then you can encode this Unicode string back into Latin-1 for output:

>>> (line.decode('latin-1')+"\n").encode('latin-1')
'Solo Fue Una Noche;NG\xb2;Comienzos;9;2004\n'

But for what you are doing you don't need encode() at all. You say are getting Latin-1 from Spotify. You want your output to be Latin-1. So you can just append "\n" to your input string and write it out.

>>> line="Solo Fue Una Noche;NG²;Comienzos;9;2004"
>>> line + "\n"
'Solo Fue Una Noche;NG\xb2;Comienzos;9;2004\n'
>>> ndt.write(line+"\n")

Upvotes: 1

Related Questions