Reputation: 714
We have an MS Access .mdb file produced, I think, by an Access 2000 database. I am trying to export a table to SQL with mdbtools, using this command:
mdb-export -S -X \\ -I orig.mdb Reviewer > Reviewer.sql
That produces the file I expect, except one thing: Some of the characters are represented as question marks. This: "He wasn't ready" shows up like this: "He wasn?t ready", only in some cases (primarily single/double curly quotes), where maybe the content was pasted into the DB from MS Word. Otherwise, the data look great.
I have tried various values for "export MDB_ICONV=". I've tried using iconv on the resulting file, with ISO-8859-1 in the from/to, with UTF-8 in the from/to, with WINDOWS-1250 and WINDOWS-1252 and WINDOWS-1256 in the from, in various combinations. But I haven't succeeded in getting those curly quotes back.
Frankly, based on the way the resulting file looks, I suspect the issue is either in the original .mdb file, or in mdbtools. The malformed characters are all single question marks, but it is clear that they are not malformed versions of the same thing; so (my gut says) there's not enough data in the resulting file; so (my gut says) the issue can't be fixed in the resulting file.
Has anyone run into this one before? Any tips for moving forward? FWIW, I don't have and never have had MS Access -- the file is coming from a 3rd party -- so this could be as simple as changing something on the database, and I would be very glad to hear that.
Thanks.
Upvotes: 0
Views: 3936
Reputation: 66152
It's most likely due to the fact that the data in the Access file is UTF, and MDB Tools is trying to convert it to ascii/latin/is0-8859-1 or some other encoding. Since these encodings don't map all the UTF characters properly, you end up with question marks. The information here may help you fix your encoding issues by getting MDB Tools to use the correct encoding.
Upvotes: 0
Reputation: 27478
Looks like "smart quotes" have claimed yet another victim.
MS word takes plain ascii quotes and translates them to the double-byte left-quote and right-quote characters and translates a single quote into the double byte apostrophe character. The double byte characters in question blelong to to an MS code page which is roughly compatable with unicode-16 except for the silly quote characters.
There is a perl script called 'demoroniser.pl' which undoes all this malarky and converts the quotes back to plain ASCII.
Upvotes: 2