NULL
NULL

Reputation: 1589

mysql line breaks formatting

I have an application which has a always worked with no issues. Fast forward to today: all formatting is broken. Basically I am inserting a plain text emails to mysql db, something that has worked for more than 5 years because nothing has changed. In my php code the plain text looked like this:

hello [name],

How are you?

This is a test.

Thank you.

Ceo

Today I looked at the same php code containing the email, so this is just sitting there, like a file. Then I look at existing plain text of the email which has always been in the database and they both look like this:

hello [name],\r\n\r\n�How are you?\r\n\r\n�This is a test.\r\n\r\n�Thank you.\r\n\r\n�
Ceo

Now before I pull all my hair out, do you all know what happened in mysql db, on the browser, the server? (Oh and due to this, I am unable to get emails too.)

The glories of Monday.

Upvotes: 1

Views: 824

Answers (3)

NULL
NULL

Reputation: 1589

ok so i tried using this mysql_real_escape_string

now my email looks like this:

hello [name],\\n\\nHow are you?\\n\\nThis is a test.\\n\\nThank you.\\n\\nCeo

its adding extra slash to it.

my html/php code looks like this:

hello [name],\n\n

How are you?\n\n

This is a test.\n\n

Thank you.\n\n

Ceo

Upvotes: 0

Kneel-Before-ZOD
Kneel-Before-ZOD

Reputation: 4221

The database's (or table or column) encoding or collation has somehow been changed. If you want to verify, check that column's encoding, and compare it with the encoding of other columns without the problem.
Fortunately, it's easy to change the encoding to the proper format (within cPanel or PHPMyAdmin) without having to update the actual data.

I believe that latin1_swedish_ci is the default collation that causes no problems and utf-8 should be the encoding.

Hope this helps.

Upvotes: 0

Rob Starling
Rob Starling

Reputation: 3908

"�" has the following characters from latin-1 (iso-8859-1):

   303  195  C3    Ã    LATIN CAPITAL LETTER A WITH TILDE
   257  175  AF    ¯    MACRON
   302  194  C2    Â    LATIN CAPITAL LETTER A WITH CIRCUMFLEX
   277  191  BF    ¿    INVERTED QUESTION MARK
   275  189  BD    ½    VULGAR FRACTION ONE HALF

The byte sequence is, then C3 AF C2 BF C2 BD. This "smells" like UTF-8. Decoding (per https://en.wikipedia.org/wiki/UTF-8), we turn these into bit-patterns:

  • 11000011
  • 10101111
  • 11000010
  • 10111111
  • 11000010
  • 10111101

That first one (110xxxxx) indicates it's the first byte in a two-byte character, and stripping the marker bits from 11000011 10101111 yields ...00011 ..101111 or 00000000 00000000 00000000 11101111 == U+000000EF.

Similarly, the next two make ...00010 ..111111 or U+000000BF.

Then ...00010 ..111101 or U+000000BD.

U+00EF U+00BF U+00BD (per https://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF) are "�", which is clearly not right.

However, this answer — https://stackoverflow.com/a/6544206/1105015 — seems to provide some insight. EF BF BD is the UTF-8 representation of the "replacement character" U+FFFD. So it looks like something way up the line got a character that confused your system, it was stored as the replacement character, and then eventually re-rendered as latin-1.

What i'd suggest looking closely at at this point is actually the encoding you use when inserting into the db. Maybe the only thing that changed is the MySQL client used for that?

Upvotes: 3

Related Questions