neggenbe
neggenbe

Reputation: 1885

How to determine the character set of a Firebird database

I've read the following thread and I was able to make a conversion script (based on C#) that converts all my charset=NONE databases to charset=UTF8 and most of it works great (I still have a few special cases where characters are converted to weird symbols, but that's marginal).

My issue is that I have lots of backup database files (*.fbk) for which I don't know for sure if this is UTF8 or NONE. In the ideal world, my code would handle the conversion once the database is restored from file depending on the fbk file's format, so I only convert when necessary and after restore.

Is this at all possible? Or is there a way to define charset when restoring the database (either via gback of via ADO.NET provider)?

Upvotes: 3

Views: 3499

Answers (1)

Mark Rotteveel
Mark Rotteveel

Reputation: 108970

In general, a Firebird database does not have a single character set. Each and every column can have its own character set. So the only thing you can do is try and use heuristics.

  1. Use the database default character set. To be clear, the database default character set is only used when creating a new column when no explicit character set is specified. It is entirely possible for a database to have default character set UTF8, while all columns have character set WIN1251!

    You can find the database default character set with the following query:

    select RDB$CHARACTER_SET_NAME from RDB$DATABASE 
    

    NOTE: If the result is NULL, then that means the default character set is NONE.

  2. Count the different character sets of CHAR, VARCHAR and BLOB SUB_TYPE TEXT columns to see which occurs most:

    select 
      coalesce(cs.RDB$CHARACTER_SET_NAME, 'NONE') as CHARSET, 
      count(*) as CHARSET_COUNT
    from RDB$RELATIONS r
    inner join RDB$RELATION_FIELDS rf
      on rf.RDB$RELATION_NAME = r.RDB$RELATION_NAME
    inner join RDB$FIELDS f 
      on f.RDB$FIELD_NAME = rf.RDB$FIELD_SOURCE
    left join RDB$CHARACTER_SETS cs
      on cs.RDB$CHARACTER_SET_ID = f.RDB$CHARACTER_SET_ID 
    where coalesce(r.RDB$SYSTEM_FLAG, 0) = 0
    and r.RDB$VIEW_BLR is null
    and (f.RDB$FIELD_TYPE in (14, 37) or f.RDB$FIELD_TYPE = 261 and f.RDB$FIELD_SUB_TYPE = 1)
    group by 1
    order by 2 desc
    

As an aside, be aware that if clients have used connection character set NONE, then it is entirely possible that the actual character set of contents of a column may not match the defined character set of that column.

Upvotes: 4

Related Questions