Slinky
Slinky

Reputation: 5832

Line Feeds and Carriage Rerturns in Data: 0D 0A

I am writing a data clean up script (MS Smart Quotes, etc.) that will operate on mySQL tables encoded in Latin1. While scanning the data I noticed a ton of 0D 0A where the line breaks are.

Since I am cleaning the data, should I also address all of the 0D, too, by removing them? Is there ever a good reason to keep 0D (carriage return) anymore?

Thanks!

Upvotes: 1

Views: 18826

Answers (3)

SDsolar
SDsolar

Reputation: 2705

Python's readline() returns a line followed with a \O12. \O means Octal. 12 is octal for decimal 10. You can see on the ASCII table that Dec 10 is NL or LF. Newline or line feed.

Standard for end-of-line in a unix text or script file.

http://www.asciitable.com/

So be aware that the len() will include the NL unless you try to read past the EOF the len() will never be zero.

Therefore if you INSERT any line of text obtained by the Python readline() into a mysql table it will include the NL character by default, at the end.

Upvotes: 0

WWW
WWW

Reputation: 9860

The CR/LF combination is a Windows thing. *NIX operating systems just use LF. So based on the application that uses your data, you'll need to make the decision on whether you want/need to filter out CR's. See the Wikipedia entry on newline for more info.

Upvotes: 1

Devart
Devart

Reputation: 122002

0D0A (\r\n), and 0A (\n) are line terminators; \r\n is mostly used in OS Windows, \n in unix systems.

Is there ever a good reason to keep 0D anymore?

I think you should answer this question yourself. You could remove '\r' from the data, but make sure that the programs that will use this data understand that '\n' means the end of line very well. In most cases it is taken into account, but check just in case.

Upvotes: 4

Related Questions