Reputation: 21
I have some Characters in Farsi and I want to write them to a dbf file with my custom codepage which is 1 byte per character. I think this problem can be solved in one of these two ways:
1- Passing my custom codepage to the dbf table.
2- Writing binary data directly to the dbf file without using the default codepage of dbf package (which is utf8).
How can I solve this problem with either of these approaches?
Here is the code:
import dbf
man = 'مرد'
woman = 'زن'
row1 = (man, woman)
row2 = (man, woman)
with open('./file.dbf', 'w') as f:
table = dbf.Table(filename='./file.dbf',
field_specs='field1 C(3); field2 C(3)', codepage='customCodePage', on_disk=True)
table.open(dbf.READ_WRITE)
table.append(row1)
table.append(row2)
table.close()
Upvotes: 1
Views: 470
Reputation: 21
After trying to register my codec I ended up translating my data from utf8 to "Custom Farsi codec" and then to equivalent character of windows-1256 that has the same decimal codepoint. So when the user reads the data with the custom codec, the windows-1256 characters will point to the right decimal in custom codec, of course characters in this raw form are not meaningful.
An example would be Letter پ in unicode has decimal codepoint of 1662 and in custom codec it has codepoint of 148. the equivalent of 148 codepoint in windows-1256 is ”. so the پ translates to ” using 3 different dictionaries. I did this for all characters in Farsi keyboard.
Upvotes: 1
Reputation: 69051
dbf
was designed to work with existing code pages, and so custom code pages were not considered.
If you're adventerous:
dbf.code_pages
with short and long decriptions (e.g. dbf.code_pages[0xa1] = ('farsi','single-byte farsi code page')
codecs
module so that codecs.getdecoder('farsi')
and codecs.getencoder('farsi')
(or whatever name you choose to use) returns the appropriate decoder/encoderUpvotes: 1