Escape characters getting inserted into database

I am inserting arbitrary binary data into a mysql database MEDIUMBLOB using the below code. I am writing the same data to a file, from the same program. I then create a file from the DB contents:

select data from table where tag=95 order by date, time into outfile "dbout";

I then compare the output written directly to the file to the output in dbout. There are escape (0x5c, '\') characters before some bytes in the dbout file (e.g. before 0x00). This garbles the output from the database. My understanding was that by using a MEDIUMBLOB and prepared statements, I could avoid this problem. Initially I was using mysql_real_escape_string with a regular INSERT, and having the problem. Nothing seems to be fixing this.

void
insertdb(int16_t *data, size_t size, size_t nmemb)
{
    int16_t *fwbuf; // I have also tried this as char *fwbuf
    unsigned long i;
    struct tm *info;
    time_t rawtime;
    char dbuf[12];
    char tbuf[12];

    if(fwinitialized==0){
        fwbuf = malloc(CHUNK_SZ);
        fwinitialized = 1;
    }

    if(fwindex + (nmemb*size) + 1 >= CHUNK_SZ || do_exit == 1){
        MYSQL_STMT *stmt = mysql_stmt_init(con);
        MYSQL_BIND param[1];

        time(&rawtime);
        info = localtime(&rawtime);
        snprintf(dbuf, 16, "%d-%02d-%02d", 1900+info->tm_year, 1+info->tm_mon, info->tm_mday);
        snprintf(tbuf, 16, "%02d:%02d:%02d", info->tm_hour, info->tm_min, info->tm_sec);

        char *tmp = "INSERT INTO %s (date, time, tag, data) VALUES ('%s', '%s', %d, ?)";
        int len = strlen(tmp)+strlen(db_mon_table)+strlen(dbuf)+strlen(tbuf)+MAX_TAG_LEN+1;
        char *sql = (char *) malloc(len);
        int sqllen = snprintf(sql, len, tmp, db_mon_table, dbuf, tbuf, tag);

        if(mysql_stmt_prepare(stmt, sql, strlen(sql)) != 0){
            printf("Unable to create session: mysql_stmt_prepare()\n");
            exit(1);
        }

        memset(param, 0, sizeof(param));
        param[0].buffer_type = MYSQL_TYPE_MEDIUM_BLOB;
        param[0].buffer = fwbuf;
        param[0].is_unsigned = 0;
        param[0].is_null = 0;
        param[0].length = &fwindex;

        if(mysql_stmt_bind_param(stmt, param) != 0){
            printf("Unable to create session: mysql_stmt_bind_param()\n");
            exit(1);
        }

        if(mysql_stmt_execute(stmt) != 0){
            printf("Unabel to execute session: mysql_stmt_execute()\n");
            exit(1);
        }

        printf("closing\n");
        mysql_stmt_close(stmt);

        free(sql);
        fwindex = 0;

    } else {
        memcpy((void *) fwbuf+fwindex, (void *) data, nmemb*size);
        fwindex += (nmemb*size);
    }
}

So, why the escape characters in the database? I have tried a couple of combinations of hex/unhex in the program and when creating the file from msyql. That didn't seem to help either. Isn't inserting arbitrary binary data into a database be a common thing with a well-defined solution?

P.S. - Is it ok to have prepared statements that open, insert, and close like this, or are prepared statements generally for looping and inserting a bunch of data before closing?

PPS - Maybe this is important to the problem: When I try to use UNHEX like this:

select unhex(data) from table where tag=95 order by date, time into outfile "dbout";

the output is very short (less than a dozen bytes, truncated for some reason).

Upvotes: 1

Views: 1563

Answers (1)

abligh
abligh

Reputation: 25129

As MEDIUMBLOB can contain any character (even an ASCII NUL) MySQL normally escapes the output so you can tell when fields end. You can control this using ESCAPED BY. The documentation is here. Below is an excerpt. According to the last paragraph below (which I've put in bold), you can entirely disable escaping. I have never tried that, for the reason in the last sentence.

FIELDS ESCAPED BY controls how to write special characters. If the FIELDS ESCAPED BY character is not empty, it is used when necessary to avoid ambiguity as a prefix that precedes following characters on output:

  • The FIELDS ESCAPED BY character

  • The FIELDS [OPTIONALLY] ENCLOSED BY character

  • The first character of the FIELDS TERMINATED BY and LINES TERMINATED BY values

  • ASCII NUL (the zero-valued byte; what is actually written following the escape character is ASCII "0", not a zero-valued byte)

The FIELDS TERMINATED BY, ENCLOSED BY, ESCAPED BY, or LINES TERMINATED BY characters must be escaped so that you can read the file back in reliably. ASCII NUL is escaped to make it easier to view with some pagers.

The resulting file does not have to conform to SQL syntax, so nothing else need be escaped.

If the FIELDS ESCAPED BY character is empty, no characters are escaped and NULL is output as NULL, not \N. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.

A better strategy (if you only need one blob in the output file) is SELECT INTO ... DUMPFILE, documented on the same page, per the below:

If you use INTO DUMPFILE instead of INTO OUTFILE, MySQL writes only one row into the file, without any column or line termination and without performing any escape processing. This is useful if you want to store a BLOB value in a file.

Upvotes: 1

Related Questions