Ryan Killian
Ryan Killian

Reputation: 95

How can I correct for bad text encoding?

A little background. We have terrible H1B-developed systems at work. They're 50% SQL, 50% JS, 7 layers of Entity-like boilerplate in between, and in a constant state of emergency. Some deployments take hours because they're pushing literally hundreds of DB scripts each time.

As a short term fix, I wrote a program to pump the script directory through a SqlCommand object.

The issue I'm having is that they're pasting incompatible codepages together. The file looks like ascii but has some lines with unicode spaces. When it's read in and executes, it errors out with garbage characters. I had switched from autodetect to default encoding, which worked for about a day before they did something different and it started erroring out again.

SQL Management Studio flags these weird characters too but still manages to execute. Is there anyway to force the text to be "normalized" somehow? Or force it through whatever SSMS does?

Upvotes: 0

Views: 194

Answers (1)

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239824

If the actual contents of these files is all meant to be in the 7-bit ASCII character set range then you can try reading the files as binary. You'll want to strip off any leading Unicode BOM you encounter and then skip any bytes which are 0 (which will be from files encoding as UTF16). Then feed it to a decoder and claim that it's ASCII or UTF-8.

If the character set is wider than the 7-bit ASCII range then I think all bets are off, and you need to solve the real problem, which sounds like it's a people problem rather than a technical one.

Upvotes: 1

Related Questions