Reputation: 1885
I've read the following thread and I was able to make a conversion script (based on C#) that converts all my charset=NONE
databases to charset=UTF8
and most of it works great (I still have a few special cases where characters are converted to weird symbols, but that's marginal).
My issue is that I have lots of backup database files (*.fbk
) for which I don't know for sure if this is UTF8
or NONE
. In the ideal world, my code would handle the conversion once the database is restored from file depending on the fbk
file's format, so I only convert when necessary and after restore.
Is this at all possible? Or is there a way to define charset
when restoring the database (either via gback
of via ADO.NET provider)?
Upvotes: 3
Views: 3499
Reputation: 108970
In general, a Firebird database does not have a single character set. Each and every column can have its own character set. So the only thing you can do is try and use heuristics.
Use the database default character set. To be clear, the database default character set is only used when creating a new column when no explicit character set is specified. It is entirely possible for a database to have default character set UTF8, while all columns have character set WIN1251!
You can find the database default character set with the following query:
select RDB$CHARACTER_SET_NAME from RDB$DATABASE
NOTE: If the result is NULL
, then that means the default character set is NONE.
Count the different character sets of CHAR, VARCHAR and BLOB SUB_TYPE TEXT columns to see which occurs most:
select
coalesce(cs.RDB$CHARACTER_SET_NAME, 'NONE') as CHARSET,
count(*) as CHARSET_COUNT
from RDB$RELATIONS r
inner join RDB$RELATION_FIELDS rf
on rf.RDB$RELATION_NAME = r.RDB$RELATION_NAME
inner join RDB$FIELDS f
on f.RDB$FIELD_NAME = rf.RDB$FIELD_SOURCE
left join RDB$CHARACTER_SETS cs
on cs.RDB$CHARACTER_SET_ID = f.RDB$CHARACTER_SET_ID
where coalesce(r.RDB$SYSTEM_FLAG, 0) = 0
and r.RDB$VIEW_BLR is null
and (f.RDB$FIELD_TYPE in (14, 37) or f.RDB$FIELD_TYPE = 261 and f.RDB$FIELD_SUB_TYPE = 1)
group by 1
order by 2 desc
As an aside, be aware that if clients have used connection character set NONE, then it is entirely possible that the actual character set of contents of a column may not match the defined character set of that column.
Upvotes: 4