john c. j.
john c. j.

Reputation: 1185

Zero-width space vs zero-width non-joiner

What is the difference between zero-width space (U+200B) and zero-width non-joiner (U+200C) from practical point of view?

I have already read Wikipedia articles, but I can't understand if these characters are interchangeable or not.

I think they are completely interchangeable, but then I can't understand why we have two in Unicode set instead of one.

Upvotes: 7

Views: 4865

Answers (2)

Liz
Liz

Reputation: 3043

A zero width non joiner (ZWNJ) only interrupts ligatures. These are hard to notice in the latin alphabet but are most frequent in serif fonts displaying some specific combinations of lowercase letters, such as "fi" below. There are a few widely used writing systems, such as the arabic abjad, that use ligatures very prominently.

Example of ligature fi

A zero width space (ZWSP) does interrupt ligatures, but it also creates opportunities for line breaks. Very good for displaying file paths and long URLs, but beware that it sometimes screws up copy pasting.

By the way, I tested regular expression matching in Python 3.8 and Javascript 1.5 and none of them match \s. Unicode considers these characters as formatting characters (similar to direction markers and such) as opposed to space or punctuation. There are other codepoints in the same Unicode block (e.g. Thin Space, U+2009) that are considered space by Unicode and do match \s.

Upvotes: 7

gnasher729
gnasher729

Reputation: 52602

A zero-width non-joiner is almost non-existing. Its only purpose is to split things into two. For example, 123 zero-width-non-joiner 456 is two numbers with nothing in between.

A zero-width space is a space character, just a very very narrow one. For example 123 zero-width-space 456 is two numbers with a space character in between.

Upvotes: 6

Related Questions