Reputation: 5958
There's a glitch that I've noticed in GitHub markdown, the VSCode markdown extension and other places too. It's behaving particularly unusually on GitHub and using git.
Very frequently when I type in headings such as # heading
or ## sub-heading
, the heading does not render correctly. Here is an example commit for a markdown file:
Source diff:
Rich diff:
As you can see, the rich diff isn't rendering correctly, and so isn't the file when I go into "browse files" (regardless of what computer/device I use):
Somehow, after deleting the space character after ###
and re-typing it, there were changes to be committed. From my knowledge this shouldn't happen (because nothing has actually changed, I just re-typed the space character). But I committed it anyways and got the following diff:
As you can see the space character is highlighted. Now I magically get the following rich diff, which is now showing the heading:
And now, when I "browse files", the heading shows on every computer I use:
This is happening to my a lot, and I'm wondering why this is happening, how git is even able to commit no change, and if there is a way to solve this?
This is definitely not just me because others have mentioned this to me too in the past.
Note: My GitHub repo is private so I can't share a link, but it should be easy to reproduce.
Update
I opened the revision with the issue inside HxD and got the following hex output:
I then replaced the space character inside VSCode and got the following hex output:
There's an extra Â
character that is not shown in VSCode and that I didn't input. I've had this problem on both Windows and Mac OS.
Update 2
Both ascii and utf-8 define the character as Â
so I can't figure out why it's not showing up in VSCode or GitHub text editor.
I've also seen ascii defining it as the following on https://www.asciitable.com/
Upvotes: 2
Views: 4713
Reputation: 11
Another cause may be an improperly named file or missing .md file extension.
Upvotes: -2
Reputation: 387517
The byte sequence 0xC2 0xA0
is the UTF-8 sequence of for the character U+00A0
NO-BREAK SPACE. So it is a non-breaking space character which explains why it is looking like a space in editors and shows up as a difference when compared to a simple space.
The fact that it shows up as Â
within the hex editor is simply because hex editors only ever display ASCII in the text representation since they only look at a single byte at a time. So they don’t look for character sequences like this which is required for UTF-8 to encode characters outside of the ASCII space.
As for why the non-breaking space breaks the Markdown parser, this is expected if the parser conforms to the CommonMark specification. According to it, the ATX headings are required to be followed by a space, where a space is explicitly defined to be a U+0020
SPACE character.
Upvotes: 5