Reputation: 21
Hello I am trying to compile an EPUB v2.0 with html code extracted from Indesign. I have noticed there are a lot of "special characters" either at the beginning of a paragraph or at the end. For example
<p class="text_indent0px font_size0_8em line_height1_325 margin_bottom1px margin_left0px margin_right0px sans_serif floatleft">E<span class="small_caps">VELYNE</span>	</p>
What is this
	
and can I either get rid of it or replace it with a "nbsp;"?
Upvotes: 2
Views: 2531
Reputation: 11182
There are four types of character reference scheme used.
&#[0-9]+;
),&#x[a-f0-9]+;
),&[a-z]+;
),.
).Al these conversions are rendered same way. But, the coding style is different. For example, if you need to display a latin small letter E with diaeresis
then you could use any of the below convention:
ë
(decimal notation),ë
(hexadecimal notation),ë
(html notation),ë
(actual character),Likewise, as you said, what should be used (a) 	
(decimal notation) or (b)
(html notation) or (c)  
(decimal notation).
So, from the above analogy, it can be said that the (a), (b) and (c) are three different kind of notation of three different characters.
And, this is for your information that, (a) is a Horizontal Tab
, the (b) one is the non-breaking space
which is actually  
in decimal notation and the (c) is the decimal notation for normal space character.
Now, technically space
at the end of the paragraph, is nothing but meaningless. Better, you could discard those all. And if you still need to use space
inside <pre>
elements, not in <p>
or <div>
.
Hope this helps...
Upvotes: 0
Reputation: 1908
is the entity used to represent a non-breaking space
 
decimal char code of space what we enter using keyboard spacebar
	
decimal char code of horizontal tab
and 	
both represent space but
is non-breaking means multiple sequential occurrence will not be collapsed into one where as for the same case, ` will collapse to one space
	
= approx. 4
spaces and approx. 8  
spaces
Upvotes: 0
Reputation: 92274
In the HTML encoding &#{number}
, {number}
is the ascii code. Therefore, 	
is a tab which typically condenses down to one space in HTML, unless you use CSS (or the <pre>
tag) to treat it as pre formatted text.
Therefore, it's not safe to replace it with a non-breaking or a regular space unless you can guarantee that it's not being displayed as a tab anywhere.
div:first-child {
white-space: pre;
}
<div>	 Test</div>
<div>	 Test</div>
<pre>	 Test</pre>
See https://developer.mozilla.org/en-US/docs/Web/CSS/white-space and http://ascii.cl/
Upvotes: 0
Reputation: 172408
	
represents the horizontal tab
Similarly  
represent space.
To replace 	
you have to use
Upvotes: 0
Reputation: 6412
That would be a horizontal tab (i.e. the same as using the tab key).
If you want to replace it, I would suggest doing a find/replace using an ePub editor like Sigil (http://sigil-ebook.com/).
Upvotes: 0
Reputation: 7827
	
Is the ascii code for tabs. So I guess the paragraphs were indented with tabs.
If you want to replace them with
then use 4 of them
Upvotes: 1