Alex
Alex

Reputation: 531

How to write special character into a DBF file in Python?

I'm trying to write this character É into a DBF file but I keep getting UnicodeEncodeError.

Here's how I'm doing it:

def write_into_file(value):
    verdata_table = dbf.Table('VerData.dbf', 'VERS_BDD C(50);')

    verdata_table.open(mode=dbf.READ_WRITE)
    for record in ({"vers_bdd": value},):  # value contains the special character É
        verdata_table.append(record)  

All I want is to write this character into the DBF file. I guess this has something to do with the encoding of the string when trying to write it into the file but I'm not really sure.

Here the error:
UnicodeEncodeError: 'ascii' codec can't encode character '\xc9' in position 0: ordinal not in range(128)

EDIT

1) Here the complete traceback:

Traceback (most recent call last):
  File "C:/Users/amunoz/Desktop/PyCharm-Projects/ac-conversion/bco1_to_bco2.py", line 15, in <module>
    write_into_file(value)
  File "C:/Users/amunoz/Desktop/PyCharm-Projects/ac-conversion/bco1_to_bco2.py", line 11, in write_into_file
    verdata_table.append(record)
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 5676, in append
    gather(newrecord, dictdata, drop=drop)
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 8803, in gather
    record[key] = value
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3018, in __setitem__
    self.__setattr__(name, value)
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3004, in __setattr__
    self._update_field_value(name, value)
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3193, in _update_field_value
    bytes = array('B', update(value, fielddef, self._meta.memo, self._meta.input_decoder, self._meta.encoder))
  File "C:\Users\amunoz\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbf\__init__.py", line 3947, in update_character
    string = encoder(string.strip())[0]
UnicodeEncodeError: 'ascii' codec can't encode character '\xc9' in position 0: ordinal not in range(128)  

2) Here the output of repr(value):
'Éri'

Upvotes: 2

Views: 2498

Answers (3)

Ethan Furman
Ethan Furman

Reputation: 69051

The best answer depends on whether this table is only for use with Python and the dbf package1, or if you need to share it with other programs.

@snakecharmerb is correct in that you need to provide the appropriate code page when you create the dbf file, and if it is only for use with Python and the dbf package then you can specify 'utf8' (instead of 0xf0) -- but to the best of my knowledge that is not an industry standard specification for dbf files2.

If you need to share the file with other programs, then you'll need to decide on which of the many code pages3 is appropriate for your data set4.

When creating the file, add the code page:

dbf.table(table_name, table_fields, codepage=...)

1 Disclosure: I am the author of the dbf package.

2 I added 'utf8' primarily for my own convenience.

3 See the sections on DOS and Windows Emulation code pages.

4 Currently supported code pages -- use either the hex code or the first string from the tuple pair:

    0x00 : ('ascii', "plain ol' ascii"),
    0x01 : ('cp437', 'U.S. MS-DOS'),
    0x02 : ('cp850', 'International MS-DOS'),
    0x03 : ('cp1252', 'Windows ANSI'),
    0x04 : ('mac_roman', 'Standard Macintosh'),
    0x08 : ('cp865', 'Danish OEM'),
    0x09 : ('cp437', 'Dutch OEM'),
    0x0A : ('cp850', 'Dutch OEM (secondary)'),
    0x0B : ('cp437', 'Finnish OEM'),
    0x0D : ('cp437', 'French OEM'),
    0x0E : ('cp850', 'French OEM (secondary)'),
    0x0F : ('cp437', 'German OEM'),
    0x10 : ('cp850', 'German OEM (secondary)'),
    0x11 : ('cp437', 'Italian OEM'),
    0x12 : ('cp850', 'Italian OEM (secondary)'),
    0x13 : ('cp932', 'Japanese Shift-JIS'),
    0x14 : ('cp850', 'Spanish OEM (secondary)'),
    0x15 : ('cp437', 'Swedish OEM'),
    0x16 : ('cp850', 'Swedish OEM (secondary)'),
    0x17 : ('cp865', 'Norwegian OEM'),
    0x18 : ('cp437', 'Spanish OEM'),
    0x19 : ('cp437', 'English OEM (Britain)'),
    0x1A : ('cp850', 'English OEM (Britain) (secondary)'),
    0x1B : ('cp437', 'English OEM (U.S.)'),
    0x1C : ('cp863', 'French OEM (Canada)'),
    0x1D : ('cp850', 'French OEM (secondary)'),
    0x1F : ('cp852', 'Czech OEM'),
    0x22 : ('cp852', 'Hungarian OEM'),
    0x23 : ('cp852', 'Polish OEM'),
    0x24 : ('cp860', 'Portugese OEM'),
    0x25 : ('cp850', 'Potugese OEM (secondary)'),
    0x26 : ('cp866', 'Russian OEM'),
    0x37 : ('cp850', 'English OEM (U.S.) (secondary)'),
    0x40 : ('cp852', 'Romanian OEM'),
    0x4D : ('cp936', 'Chinese GBK (PRC)'),
    0x4E : ('cp949', 'Korean (ANSI/OEM)'),
    0x4F : ('cp950', 'Chinese Big 5 (Taiwan)'),
    0x50 : ('cp874', 'Thai (ANSI/OEM)'),
    0x57 : ('cp1252', 'ANSI'),
    0x58 : ('cp1252', 'Western European ANSI'),
    0x59 : ('cp1252', 'Spanish ANSI'),
    0x64 : ('cp852', 'Eastern European MS-DOS'),
    0x65 : ('cp866', 'Russian MS-DOS'),
    0x66 : ('cp865', 'Nordic MS-DOS'),
    0x67 : ('cp861', 'Icelandic MS-DOS'),
    0x68 : (None, 'Kamenicky (Czech) MS-DOS'),
    0x69 : (None, 'Mazovia (Polish) MS-DOS'),
    0x6a : ('cp737', 'Greek MS-DOS (437G)'),
    0x6b : ('cp857', 'Turkish MS-DOS'),
    0x78 : ('cp950', 'Traditional Chinese (Hong Kong SAR, Taiwan) Windows'),
    0x79 : ('cp949', 'Korean Windows'),
    0x7a : ('cp936', 'Chinese Simplified (PRC, Singapore) Windows'),
    0x7b : ('cp932', 'Japanese Windows'),
    0x7c : ('cp874', 'Thai Windows'),
    0x7d : ('cp1255', 'Hebrew Windows'),
    0x7e : ('cp1256', 'Arabic Windows'),
    0xc8 : ('cp1250', 'Eastern European Windows'),
    0xc9 : ('cp1251', 'Russian Windows'),
    0xca : ('cp1254', 'Turkish Windows'),
    0xcb : ('cp1253', 'Greek Windows'),
    0x96 : ('mac_cyrillic', 'Russian Macintosh'),
    0x97 : ('mac_latin2', 'Macintosh EE'),
    0x98 : ('mac_greek', 'Greek Macintosh'),
    0xf0 : ('utf8', '8-bit unicode'),

Upvotes: 4

snakecharmerb
snakecharmerb

Reputation: 55749

Looking at the source, Table objects accept a codepage parameter in their __init__ method, which will override the default, which appears to be ASCII. So you probably need to create your table like this:

def write_into_file(value):
    verdata_table = dbf.Table('VerData.dbf', 'VERS_BDD C(50);', codepage=0xf0)

(0xf0 is the hex code dbf uses for UTF-8 - see the table in dbf/__init__.py)

Upvotes: 2

Kris
Kris

Reputation: 8868

You need to override the default input encoding which is ascii. Set the input encoding as "utf-8" like

dbf.input_decoding = "utf-8"

After this, you can open and write the file.

Upvotes: -1

Related Questions