Reputation: 1306
Consider a unicode character, such as zero-width space, which is not on any conventional keyboard and is not part of any human writing system. Suppose one wants to use perl to remove this character from a string, or one wants to print the character in bash unix.
This post reviews how one can do these things using hexadecimal code, and then asks: Is there a more direct (or elegant) way to do these things, using perhaps the decimal representation of the character?
The "zero-width space" http://www.unicode-symbol.com/u/200B.html shows up occasionally in text files.
For instance, on a macbook pro, from Messages.app, I saved an sms conversation as pdf. Then I opened the pdf in Preview, copied all, and pasted the clipboard into a file z
. Then less z
showed many instances of <U+200B>
,
and when I opened it in vim
it showed up as <200b>
.
Similarly, "pop directional formatting", http://www.unicode-symbol.com/u/202C.html, shows up when I copy and paste a phone number from the telephone field of Contacts.app.
Often I want to get the plain text from a string---anything that a human being would actually want to read, including letters in any language such as French é, Greek β, Arabic, Chinese and of course tab, space, and newline---without other characters.
This is because the other characters can cause problems. Not only are they a distraction in less and vim, but they seem to cause LaTeX, pdflatex, to throw an error.
One can remove "zero-length space" as follows:
\xe2\x80\x8b
perl -p -e 's/\xe2\x80\x8b//g;' myfile
Using the same approach, one can print the character:
printf '\xe2\x80\x8b'
But on the same row
in http://www.unicode-symbol.com/u/200B.html
where one obtains the triad of hexadecimal numbers, one also finds that the decimal representation is 14844043
. Is there a way to use this decimal representation, or some other approach more direct than pasting together three hexadecimal codes?
Upvotes: 1
Views: 488
Reputation: 241858
Elegance is in the eye of the beholder.
But, the -C
switch enables Perl's unicode handling, so you can take advantage of that.
perl -CD -wpe 's/\x{200B}//g' file
Also, you can use \N
to specify the full names of the characters:
perl -CD -wpe 's/\N{ZERO WIDTH SPACE}//g' file
See perlrun for the explanation of the details of -C
.
In particular, -CD
is equivalent to -Cio
, which means "make UTF-8 the default PerlIO layer for input and output streams".
Upvotes: 4
Reputation: 385744
To remove U+200B ZERO WIDTH SPACE specifically:
perl -CSD -pe's/\x{200B}//g'
perl -CSD -pe's/\N{U+200B}//g'
perl -CSD -pe's/\N{ZERO WIDTH SPACE}//g'
-CSD
handles encoding/decoding STDIN/STDOUT/STDERR/ARGV. (UTF-8, specifically.)
Specifying file to process to Perl one-liner.
That said, it sounds like you want a more general approach that would match "characters like ZERO WIDTH SPACE", not just ZERO WIDTH SPACE. But it's unclear what that means. Here are the properties ZERO WIDTH SPACE has:
$ uniprops -a1 200B
U+200B ‹U+200B› \N{ZERO WIDTH SPACE}
\pC
\p{Cf}
All
Any
Assigned
C
Other
Case_Ignorable
CI
Cf
Format
Changes_When_NFKC_Casefolded
CWKCF
Common
Zyyy
Default_Ignorable_Code_Point
DI
General_Punctuation
InPunctuation
Graph
X_POSIX_Graph
Print
X_POSIX_Print
Unicode
Age=1.1
Age=V1_1
Bidi_Class=BN
Bidi_Class=Boundary_Neutral
BC=BN
Bidi_Paired_Bracket_Type=None
Block=General_Punctuation
BLK=Punctuation
Block=Punctuation
Canonical_Combining_Class=0
Canonical_Combining_Class=Not_Reordered
CCC=NR
Canonical_Combining_Class=NR
Script_Extensions=Common
Decomposition_Type=None
DT=None
East_Asian_Width=Neutral
Grapheme_Cluster_Break=CN
Grapheme_Cluster_Break=Control
GCB=CN
Hangul_Syllable_Type=NA
Hangul_Syllable_Type=Not_Applicable
HST=NA
Identifier_Status=Restricted
Identifier_Type=Default_Ignorable
Indic_Positional_Category=NA
InPC=NA
Indic_Syllabic_Category=Other
InSC=Other
Joining_Group=No_Joining_Group
JG=NoJoiningGroup
Joining_Type=T
Joining_Type=Transparent
JT=T
Line_Break=ZW
Line_Break=ZWSpace
LB=ZW
Numeric_Type=None
NT=None
Numeric_Value=NaN
NV=NaN
Present_In=1.1
IN=1.1
Present_In=2.0
IN=2.0
Present_In=V2_0
Present_In=2.1
IN=2.1
Present_In=V2_1
Present_In=3.0
IN=3.0
Present_In=V3_0
Present_In=3.1
IN=3.1
Present_In=V3_1
Present_In=3.2
IN=3.2
Present_In=V3_2
Present_In=4.0
IN=4.0
Present_In=V4_0
Present_In=4.1
IN=4.1
Present_In=V4_1
Present_In=5.0
IN=5.0
Present_In=V5_0
Present_In=5.1
IN=5.1
Present_In=V5_1
Present_In=5.2
IN=5.2
Present_In=V5_2
Present_In=6.0
IN=6.0
Present_In=V6_0
Present_In=6.1
IN=6.1
Present_In=V6_1
Present_In=6.2
IN=6.2
Present_In=V6_2
Present_In=6.3
IN=6.3
Present_In=V6_3
Present_In=7.0
IN=7.0
Present_In=V7_0
Present_In=8.0
IN=8.0
Present_In=V8_0
Present_In=9.0
IN=9.0
Present_In=V9_0
Present_In=10.0
IN=10.0
Present_In=V10_0
Present_In=11.0
IN=11.0
Present_In=V11_0
Present_In=12.0
IN=12.0
Present_In=V12_0
Present_In=12.1
IN=12.1
Present_In=V12_1
Present_In=13.0
IN=13.0
Present_In=V13_0
Script=Common
SC=Zyyy
Script=Zyyy
Scx=Zyyy
Script_Extensions=Zyyy
Sentence_Break=FO
Sentence_Break=Format
SB=FO
Vertical_Orientation=R
Vertical_Orientation=Rotated
Vo=R
Word_Break=Other
WB=XX
Word_Break=XX
The first two might be the ones of interest two.
\p{General_Category=Format}
aka \p{Gc=Cf}
aka \p{Format}
aka \p{Cf}
perl -CSD -pe's/\p{Cf}//g'
This property is shared by the following 161 Code Points:
$ unichars -a '\p{Cf}' | cat
---- U+000AD SOFT HYPHEN
---- U+00600 ARABIC NUMBER SIGN
---- U+00601 ARABIC SIGN SANAH
---- U+00602 ARABIC FOOTNOTE MARKER
---- U+00603 ARABIC SIGN SAFHA
---- U+00604 ARABIC SIGN SAMVAT
---- U+00605 ARABIC NUMBER MARK ABOVE
---- U+0061C ARABIC LETTER MARK
---- U+006DD ARABIC END OF AYAH
---- U+0070F SYRIAC ABBREVIATION MARK
---- U+008E2 ARABIC DISPUTED END OF AYAH
---- U+0180E MONGOLIAN VOWEL SEPARATOR
---- U+0200B ZERO WIDTH SPACE
---- U+0200C ZERO WIDTH NON-JOINER
---- U+0200D ZERO WIDTH JOINER
---- U+0200E LEFT-TO-RIGHT MARK
---- U+0200F RIGHT-TO-LEFT MARK
---- U+0202A LEFT-TO-RIGHT EMBEDDING
---- U+0202B RIGHT-TO-LEFT EMBEDDING
---- U+0202C POP DIRECTIONAL FORMATTING
---- U+0202D LEFT-TO-RIGHT OVERRIDE
---- U+0202E RIGHT-TO-LEFT OVERRIDE
---- U+02060 WORD JOINER
---- U+02061 FUNCTION APPLICATION
---- U+02062 INVISIBLE TIMES
---- U+02063 INVISIBLE SEPARATOR
---- U+02064 INVISIBLE PLUS
---- U+02066 LEFT-TO-RIGHT ISOLATE
---- U+02067 RIGHT-TO-LEFT ISOLATE
---- U+02068 FIRST STRONG ISOLATE
---- U+02069 POP DIRECTIONAL ISOLATE
---- U+0206A INHIBIT SYMMETRIC SWAPPING
---- U+0206B ACTIVATE SYMMETRIC SWAPPING
---- U+0206C INHIBIT ARABIC FORM SHAPING
---- U+0206D ACTIVATE ARABIC FORM SHAPING
---- U+0206E NATIONAL DIGIT SHAPES
---- U+0206F NOMINAL DIGIT SHAPES
---- U+0FEFF ZERO WIDTH NO-BREAK SPACE
---- U+0FFF9 INTERLINEAR ANNOTATION ANCHOR
---- U+0FFFA INTERLINEAR ANNOTATION SEPARATOR
---- U+0FFFB INTERLINEAR ANNOTATION TERMINATOR
---- U+110BD KAITHI NUMBER SIGN
---- U+110CD KAITHI NUMBER SIGN ABOVE
---- U+13430 EGYPTIAN HIEROGLYPH VERTICAL JOINER
---- U+13431 EGYPTIAN HIEROGLYPH HORIZONTAL JOINER
---- U+13432 EGYPTIAN HIEROGLYPH INSERT AT TOP START
---- U+13433 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM START
---- U+13434 EGYPTIAN HIEROGLYPH INSERT AT TOP END
---- U+13435 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM END
---- U+13436 EGYPTIAN HIEROGLYPH OVERLAY MIDDLE
---- U+13437 EGYPTIAN HIEROGLYPH BEGIN SEGMENT
---- U+13438 EGYPTIAN HIEROGLYPH END SEGMENT
---- U+1BCA0 SHORTHAND FORMAT LETTER OVERLAP
---- U+1BCA1 SHORTHAND FORMAT CONTINUING OVERLAP
---- U+1BCA2 SHORTHAND FORMAT DOWN STEP
---- U+1BCA3 SHORTHAND FORMAT UP STEP
---- U+1D173 MUSICAL SYMBOL BEGIN BEAM
---- U+1D174 MUSICAL SYMBOL END BEAM
---- U+1D175 MUSICAL SYMBOL BEGIN TIE
---- U+1D176 MUSICAL SYMBOL END TIE
---- U+1D177 MUSICAL SYMBOL BEGIN SLUR
---- U+1D178 MUSICAL SYMBOL END SLUR
---- U+1D179 MUSICAL SYMBOL BEGIN PHRASE
---- U+1D17A MUSICAL SYMBOL END PHRASE
---- U+E0001 LANGUAGE TAG
---- U+E0020 TAG SPACE
---- U+E0021 TAG EXCLAMATION MARK
---- U+E0022 TAG QUOTATION MARK
---- U+E0023 TAG NUMBER SIGN
---- U+E0024 TAG DOLLAR SIGN
---- U+E0025 TAG PERCENT SIGN
---- U+E0026 TAG AMPERSAND
---- U+E0027 TAG APOSTROPHE
---- U+E0028 TAG LEFT PARENTHESIS
---- U+E0029 TAG RIGHT PARENTHESIS
---- U+E002A TAG ASTERISK
---- U+E002B TAG PLUS SIGN
---- U+E002C TAG COMMA
---- U+E002D TAG HYPHEN-MINUS
---- U+E002E TAG FULL STOP
---- U+E002F TAG SOLIDUS
---- U+E0030 TAG DIGIT ZERO
---- U+E0031 TAG DIGIT ONE
---- U+E0032 TAG DIGIT TWO
---- U+E0033 TAG DIGIT THREE
---- U+E0034 TAG DIGIT FOUR
---- U+E0035 TAG DIGIT FIVE
---- U+E0036 TAG DIGIT SIX
---- U+E0037 TAG DIGIT SEVEN
---- U+E0038 TAG DIGIT EIGHT
---- U+E0039 TAG DIGIT NINE
---- U+E003A TAG COLON
---- U+E003B TAG SEMICOLON
---- U+E003C TAG LESS-THAN SIGN
---- U+E003D TAG EQUALS SIGN
---- U+E003E TAG GREATER-THAN SIGN
---- U+E003F TAG QUESTION MARK
---- U+E0040 TAG COMMERCIAL AT
---- U+E0041 TAG LATIN CAPITAL LETTER A
---- U+E0042 TAG LATIN CAPITAL LETTER B
---- U+E0043 TAG LATIN CAPITAL LETTER C
---- U+E0044 TAG LATIN CAPITAL LETTER D
---- U+E0045 TAG LATIN CAPITAL LETTER E
---- U+E0046 TAG LATIN CAPITAL LETTER F
---- U+E0047 TAG LATIN CAPITAL LETTER G
---- U+E0048 TAG LATIN CAPITAL LETTER H
---- U+E0049 TAG LATIN CAPITAL LETTER I
---- U+E004A TAG LATIN CAPITAL LETTER J
---- U+E004B TAG LATIN CAPITAL LETTER K
---- U+E004C TAG LATIN CAPITAL LETTER L
---- U+E004D TAG LATIN CAPITAL LETTER M
---- U+E004E TAG LATIN CAPITAL LETTER N
---- U+E004F TAG LATIN CAPITAL LETTER O
---- U+E0050 TAG LATIN CAPITAL LETTER P
---- U+E0051 TAG LATIN CAPITAL LETTER Q
---- U+E0052 TAG LATIN CAPITAL LETTER R
---- U+E0053 TAG LATIN CAPITAL LETTER S
---- U+E0054 TAG LATIN CAPITAL LETTER T
---- U+E0055 TAG LATIN CAPITAL LETTER U
---- U+E0056 TAG LATIN CAPITAL LETTER V
---- U+E0057 TAG LATIN CAPITAL LETTER W
---- U+E0058 TAG LATIN CAPITAL LETTER X
---- U+E0059 TAG LATIN CAPITAL LETTER Y
---- U+E005A TAG LATIN CAPITAL LETTER Z
---- U+E005B TAG LEFT SQUARE BRACKET
---- U+E005C TAG REVERSE SOLIDUS
---- U+E005D TAG RIGHT SQUARE BRACKET
---- U+E005E TAG CIRCUMFLEX ACCENT
---- U+E005F TAG LOW LINE
---- U+E0060 TAG GRAVE ACCENT
---- U+E0061 TAG LATIN SMALL LETTER A
---- U+E0062 TAG LATIN SMALL LETTER B
---- U+E0063 TAG LATIN SMALL LETTER C
---- U+E0064 TAG LATIN SMALL LETTER D
---- U+E0065 TAG LATIN SMALL LETTER E
---- U+E0066 TAG LATIN SMALL LETTER F
---- U+E0067 TAG LATIN SMALL LETTER G
---- U+E0068 TAG LATIN SMALL LETTER H
---- U+E0069 TAG LATIN SMALL LETTER I
---- U+E006A TAG LATIN SMALL LETTER J
---- U+E006B TAG LATIN SMALL LETTER K
---- U+E006C TAG LATIN SMALL LETTER L
---- U+E006D TAG LATIN SMALL LETTER M
---- U+E006E TAG LATIN SMALL LETTER N
---- U+E006F TAG LATIN SMALL LETTER O
---- U+E0070 TAG LATIN SMALL LETTER P
---- U+E0071 TAG LATIN SMALL LETTER Q
---- U+E0072 TAG LATIN SMALL LETTER R
---- U+E0073 TAG LATIN SMALL LETTER S
---- U+E0074 TAG LATIN SMALL LETTER T
---- U+E0075 TAG LATIN SMALL LETTER U
---- U+E0076 TAG LATIN SMALL LETTER V
---- U+E0077 TAG LATIN SMALL LETTER W
---- U+E0078 TAG LATIN SMALL LETTER X
---- U+E0079 TAG LATIN SMALL LETTER Y
---- U+E007A TAG LATIN SMALL LETTER Z
---- U+E007B TAG LEFT CURLY BRACKET
---- U+E007C TAG VERTICAL LINE
---- U+E007D TAG RIGHT CURLY BRACKET
---- U+E007E TAG TILDE
---- U+E007F CANCEL TAG
\p{General_Category=Other}
aka \p{Gc=C}
aka \p{Other}
aka \p{C}
aka \pC
perl -CSD -pe's/\pC//g'
\p{General_Category=Other}
(\pC
) includes:
\p{General_Category=Control}
(\p{Cc}
): 65 Code Points\p{General_Category=Format}
(\p{Cf}
): 161 Code Points [Mentioned above]\p{General_Category=Private_Use}
(\p{Co}
): 137,468 Code Points\p{General_Category=Unassigned}
(\p{Cn}
): 830,672 Code Points\p{General_Category=Surrogate}
(\p{Cs}
): 2,048 Code PointsOf those 970,414, the following are the 226 named ones (equivalent to [\p{Cc}\p{Cf}]
):
$ unichars -a '\pC' | cat
---- U+00000 NULL
---- U+00001 START OF HEADING
---- U+00002 START OF TEXT
---- U+00003 END OF TEXT
---- U+00004 END OF TRANSMISSION
---- U+00005 ENQUIRY
---- U+00006 ACKNOWLEDGE
---- U+00007 ALERT
---- U+00008 BACKSPACE
---- U+00009 CHARACTER TABULATION
---- U+0000A LINE FEED
---- U+0000B LINE TABULATION
---- U+0000C FORM FEED
---- U+0000D CARRIAGE RETURN
---- U+0000E SHIFT OUT
---- U+0000F SHIFT IN
---- U+00010 DATA LINK ESCAPE
---- U+00011 DEVICE CONTROL ONE
---- U+00012 DEVICE CONTROL TWO
---- U+00013 DEVICE CONTROL THREE
---- U+00014 DEVICE CONTROL FOUR
---- U+00015 NEGATIVE ACKNOWLEDGE
---- U+00016 SYNCHRONOUS IDLE
---- U+00017 END OF TRANSMISSION BLOCK
---- U+00018 CANCEL
---- U+00019 END OF MEDIUM
---- U+0001A SUBSTITUTE
---- U+0001B ESCAPE
---- U+0001C INFORMATION SEPARATOR FOUR
---- U+0001D INFORMATION SEPARATOR THREE
---- U+0001E INFORMATION SEPARATOR TWO
---- U+0001F INFORMATION SEPARATOR ONE
---- U+0007F DELETE
---- U+00080 PADDING CHARACTER
---- U+00081 HIGH OCTET PRESET
---- U+00082 BREAK PERMITTED HERE
---- U+00083 NO BREAK HERE
---- U+00084 INDEX
---- U+00085 NEXT LINE
---- U+00086 START OF SELECTED AREA
---- U+00087 END OF SELECTED AREA
---- U+00088 CHARACTER TABULATION SET
---- U+00089 CHARACTER TABULATION WITH JUSTIFICATION
---- U+0008A LINE TABULATION SET
---- U+0008B PARTIAL LINE FORWARD
---- U+0008C PARTIAL LINE BACKWARD
---- U+0008D REVERSE LINE FEED
---- U+0008E SINGLE SHIFT TWO
---- U+0008F SINGLE SHIFT THREE
---- U+00090 DEVICE CONTROL STRING
---- U+00091 PRIVATE USE ONE
---- U+00092 PRIVATE USE TWO
---- U+00093 SET TRANSMIT STATE
---- U+00094 CANCEL CHARACTER
---- U+00095 MESSAGE WAITING
---- U+00096 START OF GUARDED AREA
---- U+00097 END OF GUARDED AREA
---- U+00098 START OF STRING
---- U+00099 SINGLE GRAPHIC CHARACTER INTRODUCER
---- U+0009A SINGLE CHARACTER INTRODUCER
---- U+0009B CONTROL SEQUENCE INTRODUCER
---- U+0009C STRING TERMINATOR
---- U+0009D OPERATING SYSTEM COMMAND
---- U+0009E PRIVACY MESSAGE
---- U+0009F APPLICATION PROGRAM COMMAND
---- U+000AD SOFT HYPHEN
---- U+00600 ARABIC NUMBER SIGN
---- U+00601 ARABIC SIGN SANAH
---- U+00602 ARABIC FOOTNOTE MARKER
---- U+00603 ARABIC SIGN SAFHA
---- U+00604 ARABIC SIGN SAMVAT
---- U+00605 ARABIC NUMBER MARK ABOVE
---- U+0061C ARABIC LETTER MARK
---- U+006DD ARABIC END OF AYAH
---- U+0070F SYRIAC ABBREVIATION MARK
---- U+008E2 ARABIC DISPUTED END OF AYAH
---- U+0180E MONGOLIAN VOWEL SEPARATOR
---- U+0200B ZERO WIDTH SPACE
---- U+0200C ZERO WIDTH NON-JOINER
---- U+0200D ZERO WIDTH JOINER
---- U+0200E LEFT-TO-RIGHT MARK
---- U+0200F RIGHT-TO-LEFT MARK
---- U+0202A LEFT-TO-RIGHT EMBEDDING
---- U+0202B RIGHT-TO-LEFT EMBEDDING
---- U+0202C POP DIRECTIONAL FORMATTING
---- U+0202D LEFT-TO-RIGHT OVERRIDE
---- U+0202E RIGHT-TO-LEFT OVERRIDE
---- U+02060 WORD JOINER
---- U+02061 FUNCTION APPLICATION
---- U+02062 INVISIBLE TIMES
---- U+02063 INVISIBLE SEPARATOR
---- U+02064 INVISIBLE PLUS
---- U+02066 LEFT-TO-RIGHT ISOLATE
---- U+02067 RIGHT-TO-LEFT ISOLATE
---- U+02068 FIRST STRONG ISOLATE
---- U+02069 POP DIRECTIONAL ISOLATE
---- U+0206A INHIBIT SYMMETRIC SWAPPING
---- U+0206B ACTIVATE SYMMETRIC SWAPPING
---- U+0206C INHIBIT ARABIC FORM SHAPING
---- U+0206D ACTIVATE ARABIC FORM SHAPING
---- U+0206E NATIONAL DIGIT SHAPES
---- U+0206F NOMINAL DIGIT SHAPES
---- U+0FEFF ZERO WIDTH NO-BREAK SPACE
---- U+0FFF9 INTERLINEAR ANNOTATION ANCHOR
---- U+0FFFA INTERLINEAR ANNOTATION SEPARATOR
---- U+0FFFB INTERLINEAR ANNOTATION TERMINATOR
---- U+110BD KAITHI NUMBER SIGN
---- U+110CD KAITHI NUMBER SIGN ABOVE
---- U+13430 EGYPTIAN HIEROGLYPH VERTICAL JOINER
---- U+13431 EGYPTIAN HIEROGLYPH HORIZONTAL JOINER
---- U+13432 EGYPTIAN HIEROGLYPH INSERT AT TOP START
---- U+13433 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM START
---- U+13434 EGYPTIAN HIEROGLYPH INSERT AT TOP END
---- U+13435 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM END
---- U+13436 EGYPTIAN HIEROGLYPH OVERLAY MIDDLE
---- U+13437 EGYPTIAN HIEROGLYPH BEGIN SEGMENT
---- U+13438 EGYPTIAN HIEROGLYPH END SEGMENT
---- U+1BCA0 SHORTHAND FORMAT LETTER OVERLAP
---- U+1BCA1 SHORTHAND FORMAT CONTINUING OVERLAP
---- U+1BCA2 SHORTHAND FORMAT DOWN STEP
---- U+1BCA3 SHORTHAND FORMAT UP STEP
---- U+1D173 MUSICAL SYMBOL BEGIN BEAM
---- U+1D174 MUSICAL SYMBOL END BEAM
---- U+1D175 MUSICAL SYMBOL BEGIN TIE
---- U+1D176 MUSICAL SYMBOL END TIE
---- U+1D177 MUSICAL SYMBOL BEGIN SLUR
---- U+1D178 MUSICAL SYMBOL END SLUR
---- U+1D179 MUSICAL SYMBOL BEGIN PHRASE
---- U+1D17A MUSICAL SYMBOL END PHRASE
---- U+E0001 LANGUAGE TAG
---- U+E0020 TAG SPACE
---- U+E0021 TAG EXCLAMATION MARK
---- U+E0022 TAG QUOTATION MARK
---- U+E0023 TAG NUMBER SIGN
---- U+E0024 TAG DOLLAR SIGN
---- U+E0025 TAG PERCENT SIGN
---- U+E0026 TAG AMPERSAND
---- U+E0027 TAG APOSTROPHE
---- U+E0028 TAG LEFT PARENTHESIS
---- U+E0029 TAG RIGHT PARENTHESIS
---- U+E002A TAG ASTERISK
---- U+E002B TAG PLUS SIGN
---- U+E002C TAG COMMA
---- U+E002D TAG HYPHEN-MINUS
---- U+E002E TAG FULL STOP
---- U+E002F TAG SOLIDUS
---- U+E0030 TAG DIGIT ZERO
---- U+E0031 TAG DIGIT ONE
---- U+E0032 TAG DIGIT TWO
---- U+E0033 TAG DIGIT THREE
---- U+E0034 TAG DIGIT FOUR
---- U+E0035 TAG DIGIT FIVE
---- U+E0036 TAG DIGIT SIX
---- U+E0037 TAG DIGIT SEVEN
---- U+E0038 TAG DIGIT EIGHT
---- U+E0039 TAG DIGIT NINE
---- U+E003A TAG COLON
---- U+E003B TAG SEMICOLON
---- U+E003C TAG LESS-THAN SIGN
---- U+E003D TAG EQUALS SIGN
---- U+E003E TAG GREATER-THAN SIGN
---- U+E003F TAG QUESTION MARK
---- U+E0040 TAG COMMERCIAL AT
---- U+E0041 TAG LATIN CAPITAL LETTER A
---- U+E0042 TAG LATIN CAPITAL LETTER B
---- U+E0043 TAG LATIN CAPITAL LETTER C
---- U+E0044 TAG LATIN CAPITAL LETTER D
---- U+E0045 TAG LATIN CAPITAL LETTER E
---- U+E0046 TAG LATIN CAPITAL LETTER F
---- U+E0047 TAG LATIN CAPITAL LETTER G
---- U+E0048 TAG LATIN CAPITAL LETTER H
---- U+E0049 TAG LATIN CAPITAL LETTER I
---- U+E004A TAG LATIN CAPITAL LETTER J
---- U+E004B TAG LATIN CAPITAL LETTER K
---- U+E004C TAG LATIN CAPITAL LETTER L
---- U+E004D TAG LATIN CAPITAL LETTER M
---- U+E004E TAG LATIN CAPITAL LETTER N
---- U+E004F TAG LATIN CAPITAL LETTER O
---- U+E0050 TAG LATIN CAPITAL LETTER P
---- U+E0051 TAG LATIN CAPITAL LETTER Q
---- U+E0052 TAG LATIN CAPITAL LETTER R
---- U+E0053 TAG LATIN CAPITAL LETTER S
---- U+E0054 TAG LATIN CAPITAL LETTER T
---- U+E0055 TAG LATIN CAPITAL LETTER U
---- U+E0056 TAG LATIN CAPITAL LETTER V
---- U+E0057 TAG LATIN CAPITAL LETTER W
---- U+E0058 TAG LATIN CAPITAL LETTER X
---- U+E0059 TAG LATIN CAPITAL LETTER Y
---- U+E005A TAG LATIN CAPITAL LETTER Z
---- U+E005B TAG LEFT SQUARE BRACKET
---- U+E005C TAG REVERSE SOLIDUS
---- U+E005D TAG RIGHT SQUARE BRACKET
---- U+E005E TAG CIRCUMFLEX ACCENT
---- U+E005F TAG LOW LINE
---- U+E0060 TAG GRAVE ACCENT
---- U+E0061 TAG LATIN SMALL LETTER A
---- U+E0062 TAG LATIN SMALL LETTER B
---- U+E0063 TAG LATIN SMALL LETTER C
---- U+E0064 TAG LATIN SMALL LETTER D
---- U+E0065 TAG LATIN SMALL LETTER E
---- U+E0066 TAG LATIN SMALL LETTER F
---- U+E0067 TAG LATIN SMALL LETTER G
---- U+E0068 TAG LATIN SMALL LETTER H
---- U+E0069 TAG LATIN SMALL LETTER I
---- U+E006A TAG LATIN SMALL LETTER J
---- U+E006B TAG LATIN SMALL LETTER K
---- U+E006C TAG LATIN SMALL LETTER L
---- U+E006D TAG LATIN SMALL LETTER M
---- U+E006E TAG LATIN SMALL LETTER N
---- U+E006F TAG LATIN SMALL LETTER O
---- U+E0070 TAG LATIN SMALL LETTER P
---- U+E0071 TAG LATIN SMALL LETTER Q
---- U+E0072 TAG LATIN SMALL LETTER R
---- U+E0073 TAG LATIN SMALL LETTER S
---- U+E0074 TAG LATIN SMALL LETTER T
---- U+E0075 TAG LATIN SMALL LETTER U
---- U+E0076 TAG LATIN SMALL LETTER V
---- U+E0077 TAG LATIN SMALL LETTER W
---- U+E0078 TAG LATIN SMALL LETTER X
---- U+E0079 TAG LATIN SMALL LETTER Y
---- U+E007A TAG LATIN SMALL LETTER Z
---- U+E007B TAG LEFT CURLY BRACKET
---- U+E007C TAG VERTICAL LINE
---- U+E007D TAG RIGHT CURLY BRACKET
---- U+E007E TAG TILDE
---- U+E007F CANCEL TAG
Upvotes: 4