cyfdecyf
cyfdecyf

Reputation: 836

Prevent/workaround browser converting '\n' between lines into space (for Chinese characters)

Converting newline into space makes sense for English, for example, the following HTML:

<p>
This is
a sentence.
</p>

We get the following after converting the newline into space in the browser:

This is a sentence.

This is good for English, but not good for Chinese characters because we don't use spaces to separate words in Chinese. Here's an example (The Chinese sentence has the same meaning of "This is a sentence"):

<p>
这是
一句话。
</p>

I get the following result on Chrome, Safari and IE...

这是 一句话。

...but what I wanted is the following, without the extra space:

这是一句话。

I don't know why the browser does not ignore the newline if the last character of the current line and the first character of the next line are both Chinese characters (which I think makes more sense). Or they have provided this mechanism but need special handling?

BTW, in Vim, when using "J" to join lines, no space will be added if the last and the first character of the 2 lines are all Chinese characters. But for English, a space will be added. So I guess Vim does some special handling for this.

UPDATE:

Though I think this is an issue with the browser, I have to live with that. So currently I would preprocess my Markdown text to join Chinese lines before generating HTML. Here's how I do this in Ruby, complete code which also handles Chinese punctuations is on gist

#encoding: UTF-8

# Requires ruby 1.9.x, and assume using UTF-8 encoding

class String
  # The regular expression trick to match CJK characters comes from
  # http://stackoverflow.com/a/4681577/306935
  def join_chinese
    gsub(/(\p{Han})\n(\p{Han})/m, '\1\2')
  end
end

Upvotes: 20

Views: 4088

Answers (4)

Lei Zhao
Lei Zhao

Reputation: 1156

So far the shortest way I know to achieve the effect is to break after an opening tag. But you don't want to insert extra tag in your source. It would be nice if there were some tags that does nothing. Actually, there exits one, the comment.

<p>
这是<!--
-->一句话。
</p>

This gives you the following.

这是一句话。

Source of Inspiration: No extra space

Upvotes: 4

Florian Rappl
Florian Rappl

Reputation: 3189

There is a way to solve this problem (classic workaround). In order to restrict (current) browsers of interpreting the line-break as a whitespace you have to set the font-size to 0.

For the child elements you have to set the font-size to its initial value again. So for your code an example would be:

<p class="nowhitespace">
  <span>这是</span>
  <span>一句话。</span>
</p>

The CSS could contain code like the following:

.nowhitespace { font-size: 0; }
.nowhitespace > span { font-size: 16px; }

Upvotes: 5

Jukka K. Korpela
Jukka K. Korpela

Reputation: 201568

Browsers treat newlines as spaces because the specifications say so, ever since HTML 2.0. In fact, HTML 2.0 was milder than later specifications; it said: “An HTML user agent should treat end of line in any of its variations as a word space in all contexts except preformatted text.” (Conventional Representation of Newlines), whereas newer specifications say this stronger (describing it as what happens in HTML).

The background is that HTML and the Web was developed with mainly Western European languages in mind; this is reflected in many features of the original specifications and early implementations. Only slowly have they been internationalized.

It is unlikely that the parsing rules will be changed. More likely, what might happen is sensitivity to language or character properties rendering. This would mean that a line break still gets taken as a space (and the DOM string will contain Ascii space character), but a string like 这是 一句话。 would be rendered as if the space were not there. This what the HTML 4.01 specification seems to refer to (White space). The text is somewhat confused, but I think it tries to say that the behavior would depend in the content language, either inferred by the browser or as declared in markup.

But browsers don’t do such things yet. Declaring the language of content, e.g. <html lang=zh>, is a good principle but has little practical impact—in rendering, it may affect the browser’s choice of a default font (but how many authors let browsers use their default fonts?). It may even result in added spacing, if the space character happens to be wider in the browser’s default font for the language specified.

According to the CSS3 Text draft, you could use the text-spacing property. The value none “Turns off all text-spacing features. All fullwidth characters are set with full-width glyphs.” Unfortunately, no browser seems to support this yet.

Upvotes: 13

user613857
user613857

Reputation:

You can use <pre> tags for preformatted text and you can change it's style as well. Preformatted text will take newline characters literally and render it as a new line.

If you don't want <pre>:

Newline character is also considered a whitespace. When you insert a newline character, it will consider the following line a part of the previous line and simply substitute that newline character with a space.

You must explicitly declare a newline in HTML, just use <br>.

Upvotes: -3

Related Questions