Importing Image From MSSQL to PostgreSQL

Question

I have a set of image imported from MSSQL in csv. The file size is 1gb. Datatype in MSSQL is image. When I want to import to Postgres, datatype in bytea, error occured.

ERROR: invalid byte sequence for encoding "UTF8": 0xff
CONTEXT: COPY photo, line 1

When I look into the csv file, the image file is in

0xFFD8FFE000104A46494600010101006000600000FFE1...

My questions:

What datatype in PostgreSQL can be used to import this type of file?
How to retrieve image from this type of file using Postgres and PHP?

Solution that I tried:

I tried to copy just three lines and save to new csv file, import it into the photo table, and it succeed. Weird, why is it when I want to import whole csv table, error occurred.
I have tried this https://stackoverflow.com/a/22211207/3602791 in my php using sample image and it was a success, but when I want to retrieve the three lines image that I imported, it failed saying that my image have an error.

http://pastebin.com/WrfjFqY6 This is a sample of line in the csv. 2 columns, id and photo.

Anyone know how to solve this? Thanks in advance.

Craig Ringer · Accepted Answer

As yenyen notes in the comments, the issue was that the input was UCS-2 (probably really UTF-16) encoded.

UCS-2 is a two-byte-per-character encoding that contains null bytes. If you tell PostgreSQL the file is utf-8 then it'll see the input as garbage full of invalid utf-8 sequences. If you tell PostgreSQL it's a simple 1-byte encoding like latin1, PostgreSQL will see the zero (null) byte and realise it's not latin-1 after all.

The trick here is to examine the input file with an editor that can show the raw bytes, not just use a text editor that automagically reads the BOM and loads it as encoded text. If in doubt use a hex editor.

Importing Image From MSSQL to PostgreSQL

Answers (1)

Related Questions