python odfpy AttributeError: Text instance has no attribute encode

Question

I'm trying to read from an ods (Opendocument spreadsheet) document with the odfpy modules. So far I've been able to extract some data but whenever a cell contains non-standard input the script errors out with:

Traceback (most recent call last):
File "python/test.py", line 26, in 
 print x.firstChild
File "/usr/lib/python2.7/site-packages/odf/element.py", line 247, in __str__
 return self.data.encode()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0105' in position 4: ordinal not in range(128)

I tried to force an encoding on the output but apparently it does not go well with print:

Traceback (most recent call last):
  File "python/test.py", line 27, in 
   print x.firstChild.encode('utf-8', 'ignore')
AttributeError: Text instance has no attribute 'encode'

What is the problem here and how could it be solved without editing the module code (which I'd like to avoid at all cost)? Is there an alternative to running encode on output that could work?

Here is my code:

from odf.opendocument import Spreadsheet
from odf.opendocument import load
from odf.table import Table,TableRow,TableCell
from odf.text import P
import sys,codecs
doc = load(sys.argv[1])
d = doc.spreadsheet
tables = d.getElementsByType(Table)
for table in tables:
  tName = table.attributes[(u'urn:oasis:names:tc:opendocument:xmlns:table:1.0', u'name')]
  print tName
  rows = table.getElementsByType(TableRow)
  for row in rows[:2]:
    cells = row.getElementsByType(TableCell)
    for cell in cells:
      tps = cell.getElementsByType(P)
      if len(tps)>0:
        for x in tps:
          #print x.firstChild
          print x.firstChild.encode('utf-8', 'ignore')

WKPlus · Accepted Answer

Maybe you are not using the latest odfpy, in the latest verion, the __str__ method of Text is implemented as:

def __str__(self):
    return self.data

Update odfpy to the latest version, and modify your code as:

print x.firstChild.__str__().encode('utf-8', 'ignore')

UPDATE

This is another method for getting the raw unicode data for Text: __unicode__. So if you don't want to update odfpy, modify your code as:

print x.firstChild.__unicode__().encode('utf-8', 'ignore')

python odfpy AttributeError: Text instance has no attribute encode

Answers (2)

Related Questions