Reputation: 19
I'm trying to copy attachments from one confluence page to another in python 3.9 via RestAPI. While doing that I've found a docx-file which has some strange characters in filename. Downloadlink to File
The filename is as follows: Template_Anfrage Eingangsbestätigung.docx
If I'm deleting the char 'ä' it does this: Template_Anfrage Eingangsbestatigung.docx
I would expect this: Template_Anfrage Eingangsbesttigung.docx
Can you tell me what caused this problem. And if you could tell me how to convert these characters to normal utf-8 chars that would be awesome.
Sorry for my bad english. And sorry if this is a stupid question. I'm an absolut beginner and I didn't found a solution on the web because I don't really know what to search for.
Upvotes: 0
Views: 326
Reputation: 30113
The 'ä' on mac (
ä
) is different to the 'ä' on windows (ä
)
Your issue does not stem from OS difference (Mac versus Windows); it is about Unicode normalization rather, see following script and its output:
import unicodedata
def printref( phase, strings ):
global origins
linetemplate = '{0:<10} {1:<4} {2:4} {3:4} {4:4} {5}'
print( '' )
print( chr(0x20)*10, phase.ljust(9,chr(0x20)), strings[0]==strings[1] )
for ii, chars in enumerate( strings):
print( linetemplate.format( origins[ii], len(chars), chars,
chars.encode('utf-8').decode('cp1252'), # mojibake
'', ''
))
for char in chars:
print( linetemplate.format( '', len(char), char,
char.encode('utf-8').decode('cp1252'), # mojibake
unicodedata.category(char),
unicodedata.name(char,'???') ) )
strings = ['ä', 'ä']
origins = ['filename', 'question']
printref( 'original', strings)
for form in ['NFKC', 'NFKD']:
printref( form, [ unicodedata.normalize(form, x) for x in strings] )
Output: .\SO\68919847.py
original False
filename 2 ä ä
1 a a Ll LATIN SMALL LETTER A
1 ̈ ̈ Mn COMBINING DIAERESIS
question 1 ä ä
1 ä ä Ll LATIN SMALL LETTER A WITH DIAERESIS
NFKC True
filename 1 ä ä
1 ä ä Ll LATIN SMALL LETTER A WITH DIAERESIS
question 1 ä ä
1 ä ä Ll LATIN SMALL LETTER A WITH DIAERESIS
NFKD True
filename 2 ä ä
1 a a Ll LATIN SMALL LETTER A
1 ̈ ̈ Mn COMBINING DIAERESIS
question 2 ä ä
1 a a Ll LATIN SMALL LETTER A
1 ̈ ̈ Mn COMBINING DIAERESIS
Unfortunately, my browser renders all ä
and ä
in the same way; the following picture shows the difference better:
Upvotes: 1