Reputation: 3
Currently we are using query aggregation for all of our SQL transaction. We are now looking to add BLOB support to the server, which would mean that we will have to convert the byte array to a string, so it can be sent with the query. We would like to have the BLOB data as an byte array on the server, but the server seems to miss some of the characters. Especially '' and '[newline]'.
This is, what I assume, because of SQL Server "Line Continuation". I have tried with different methods, and nothing seems to help. I am wondering if there is a way to tell my stored procedure to not do it or to stop this feature on the server or if there is other ways to bypass this that does not require any changes to the data.
We are using ASCII conversion on the client in c#
Encoding.ASCII.GetString(Blob, 0, Blob.Length).Replace("'", "''")
to make as few conversions as possible.
On the server we convert the string to varbinary
with
CONVERT(VARBINARY(MAX), (@BLOB))
I know using
Convert.ToBase64String(BLOB)
will stop this problem. This would, however, also require us to (Convert bytes to string on client)->(Convert string to bytes in SQL)->(Convert bytes to string either on server or client)->(Convert string to byte on client) for one blob to be uploaded and download to and from the server.
We can't go away from aggregation, as it is required to keep the servers running more smoothly, and saving the string would take more space than necessary.
Any knowledge about this is welcome
Edit
Many people in the comments are confused about this question, and I can understand it. I will try to answer to the why, even thou it does not have anything with my OG question to do. There will be a lot of typos, as I am doing this in my underwear during midnight in my bed.
We have testing software solution, which are here and there. They are mostly in factories on bad pc:s with very limited internet access. Each station can have a certain amount DUTs (Device Under Test) running, which all have a test performing on them. Each test have a certain amount of steps to perform. So there can be a station with 1 dut that has 13 test steps and there can be a station with 50 duts that has 40 tet steps. These can be located in China, poland, canada etc (the companies often have most in europe and a couple onanother continent).
We do not have any server side software, so all inserts becomes an injection directly (this is bad, we know, but it is 10 years old legacy code and everything is in rambles as I am the lone fulltimer on this project, while only working for 6 months and were never able to talk with the previous lone fulltimer because of natural causes). Because of this, and low funds for the project, I will not try to do any drastic changes.
So the problem, what I have heard, was that when having 50 threads trying to insert 13 separate step objects did not just hog up the servers connection, but also stop the production. The solution was "Query Aggregation" (I thought this was some SQL standard bullshit as I want to remember I have read in the code comments about is but I have not found it) which isbundling everything into a string that is then processed on the server. The server will then slice it up and send the variables to all the stored procedures (I am only the messenger, not the SQL wizard). And when doing this way we need to convert the Blob to a string too, as it is bound to a step in the test and pipeline for the process is like a labyrinth, so almost impossible to get the id back by changing the calls.
Now I want to change the blob byte array, in c#, with ascii, as SQL Server uses normal ascii conversion when I will cast it from varchar(max)
to a varbinary(max)
. The problem is that some characters mysteriously disappear, when doing so, as I can not convert from the byte array back to a zip when it is downloaded from the server. I have discovered, by comparing the ascii characters one by one, that three characters (or six bytes) has been removed from the array of bytes that has been on the server. In ASCII these three characters are '', '' and '\r', and they all appeared at the end of their rows.
By comparing this find with Microsofts own documentation on Line Continuation (which can be viewed here) I thought these cases seemed to be linked, which is why I would like to disable it as I do not use this functionality.
I can also use the
Convert.ToBase64String(BLOB)
as it has never generated this problem once, even when sending in a 50 MB file. It can be an coincidence, but I believe that it does something that prevents this scenario from happening (I want to remember that I took it from some brainiac that used this method for the same reason or something like that). The problem is 1. the string becomes very big (I believe 2,6 times bigger than ascii) and 2. the server doesn't convert this way. So when I cast it will assume that I have an ascii string, which means that the bytes on the server will represent an ascii string of a Base64String representing an byte array, which creates a lot of conversions.
We are on the decision step of if we should just skip converting it to a byte array on the server and save it as a varchar(max)
instead or if we should keep fighting to find a solution to keep it as a byte array so the users can just download it without any conversions. And by we I mean my mentor that got one foot in this project and I.
As mentioned before, any knowledge about this is welcome
Upvotes: 0
Views: 136
Reputation: 416111
This really scares me:
.Replace("'", "''")
It tells me you are NOT building your SQL correctly, and instead doing something incredibly dangerous. If you are worried about escaping single quotes for including an an SQL statement, whether for a blob field or otherwise, things have already gone horribly wrong.
Data should always be included in an SQL statement via parameterized queries. NEVER via string concatenation. Parameterized queries quarantine this data away from the rest of the statement, so there is no possibility of injection.
It's important because fixing the parameterization issue will almost certainly also fix the problem you're having with byte arrays.
In other words, say we have this simple query:
SELECT * FROM Students WHERE First_Name = 'Bobby';
But what you actually have is a variable firstName
that you want to use instead of the hard-coded Bobby
.
So you write this:
string sql = "SELECT * FROM Students WHERE First_Name = '" + firstName.Replace("'", "''") + "';";
using var cmd = new SqlCommand(sql, connection);
// ...
Unfortunately, the above code is WRONG. That is NOT OKAY. Even though you were smart enough to attempt to sanitize the input, if you're thinking "sanitize" rather than "quarantine" you've still missed the boat.
Instead, you MUST do something more like this:
string sql = "SELECT * FROM Students WHERE First_Name = @FirstName;";
using var cmd = new SqlCommand(sql, connection);
cmd.Parameters.Add("@FirstName", SqlDbType.NVarChar, 20).Value = firstName;
// ...
With this code, even the database server never directly substitutes the variable into the SQL statement. It's always kept completely separate, as a variable quarantined away from the command statement. We didn't even need to quote the parameter; it's just like we had declared a variable in an SQL script.
The great thing is the above technique works for byte arrays, too:
var MyArray = Encoding.ASCII.GetBytes("some text");
string sql = "INSERT INTO SomeTable (ByteArrayColumn) VALUES (@ByteArrayValue);";
using var cmd = new SqlCommand(sql, connection);
cmd.Parameters.Add("@ByteArrayValue", SqlDbType.VarBinary, -1).Value = MyArray;
// ...
And, of course, any ORM worth using will do this: Dapper, Entity Framework, NHibernate, etc, will all let you use parameterized queries.
The trick is, now you know about this, you need to back and UPDATE EVERY SINGLE QUERY IN YOUR APPLICATION!
Upvotes: 0
Reputation: 72229
OK I'm not going to go into the whole rat's nest, but you quite clearly have other problems apart from this, and you should strongly consider sweeping away whatever garabage SQL you have already in favour of some properly written code. It's almost certainly not necessary to "aggregate" the data and then break it up again, when you can just use separate parameters and/or table value parameters.
Be that as it may, ideally you would parameterize your query properly. For example, in C# you would do as follows
command.Parameters.Add("@myBlob", SqlDbType.VarBinary, -1).Value = someByteArray;
And then the SQL can refer to that @myBLOB
variable directly. This is the safest and most performant method of transferring a byte array to SQL Server, and I recommend you always use this method
If you cannot parameterize your query for whatever reason, a safe method of injection is necessary, which avoids any issues of encoding or misinterpretation. You can pass a hex string like this (C# again)
var blobString = Convert.ToHexString(someByteArray);
var query = $"
DECLARE @myBlob varbinary(max) = 0x{blobString};
INSERT ...
Upvotes: 1