Reputation: 88
I want to move an oracle database from a non-unicode server (EL8ISO8859P7 character set and AL16UTF16 NCHAR character set) to a unicode server. Specifically to an Oracle Express server with AL32UTF8 character set.
Simply exporting (exp) and importing (imp) the data fails. We have a lot of the varchar2 columns with their length specified in bytes. When their contents are mapped in unicode they take more bytes and are truncated.
I tried the following:
- doubling the length of all varchar2 columns of the original database with a script (varchar2(10) becomes varchar2(20))
- exporting
- importing to the new server
And it worked. Apparently doubling is arbitrary, I probably should have changed them to the same size with CHAR semantics.
I also tried the following:
- change all varchar2 columns to nvarchar2 (same size - varchar(10) becomes nvarchar(10))
- exporting
- importing to the new server
It also worked.
Somehow the latter (converting to nvarchar) seems "cleaner". Then again you have a unicode database with unicode data types which seems weird.
So the question is: is there a suggested way to go about moving the database between the two servers? Is there any serious problem with either of the two approaches I mentioned above?
Upvotes: 1
Views: 1367
Reputation:
Migrating to unicode databases is a 4 step process.
skipping steps 2 and 3 leaves you with the byte length defined fields again and probably with a lot of errors during import because data doesn't fit in the defined columns. If there is only us characters in the source database it won't be a big problem but for example latin characters will give problems because a single character could need more bytes.
Following the listed procedure prevents the length problems. There are obviously more ways to do this but rule is to first have the ddl definition ok and insert the data later.
Upvotes: 1
Reputation: 231671
Don't use NVARCHAR2
data types unless that is your only option. The national character set exists to deal with cases where you have an existing, legacy application that does not support Unicode and you want to add a handful of columns to the system that do support Unicode without touching those legacy applications. Using NVARCHAR2
columns is great for those cases but it creates all sorts of issues in application development. Plenty of tools, APIs, and applications either don't support NVARCHAR2
columns or require additional configuration to do so. And since NVARCHAR2
columns are relatively uncommon in the Oracle world, it's very easy to spend gobs of time trying to resolve the particular issues you encounter. Less critically, since AL16UTF16
requires at least 2 bytes per character, you're likely to require quite a bit more space since much of your data is likely to consist of English characters.
I would strongly prefer migrating to the new database with character-length semantics (i.e. VARCHAR2(10 BYTE)
becomes VARCHAR2(10 CHAR)
). That avoids doubling the allowed length. It also makes it much easier to explain to users what the length limits are (or to code those validations in front-ends). It's terribly confusing to most users to explain that a particular column can sometimes hold 20 characters (when only English characters are used), can sometimes hold 10 characters (when only non-English characters are used), and can sometimes hold something in the middle (when there is a mixture of characters). Character length semantics make all those issues drastically easier.
Upvotes: 3