1234567u80987
1234567u80987

Reputation: 3

python odfpy AttributeError: Text instance has no attribute encode

I'm trying to read from an ods (Opendocument spreadsheet) document with the odfpy modules. So far I've been able to extract some data but whenever a cell contains non-standard input the script errors out with:

Traceback (most recent call last):
File "python/test.py", line 26, in <module>
 print x.firstChild
File "/usr/lib/python2.7/site-packages/odf/element.py", line 247, in __str__
 return self.data.encode()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0105' in position 4: ordinal not in range(128)

I tried to force an encoding on the output but apparently it does not go well with print:

Traceback (most recent call last):
  File "python/test.py", line 27, in <module>
   print x.firstChild.encode('utf-8', 'ignore')
AttributeError: Text instance has no attribute 'encode'

What is the problem here and how could it be solved without editing the module code (which I'd like to avoid at all cost)? Is there an alternative to running encode on output that could work?

Here is my code:

from odf.opendocument import Spreadsheet
from odf.opendocument import load
from odf.table import Table,TableRow,TableCell
from odf.text import P
import sys,codecs
doc = load(sys.argv[1])
d = doc.spreadsheet
tables = d.getElementsByType(Table)
for table in tables:
  tName = table.attributes[(u'urn:oasis:names:tc:opendocument:xmlns:table:1.0', u'name')]
  print tName
  rows = table.getElementsByType(TableRow)
  for row in rows[:2]:
    cells = row.getElementsByType(TableCell)
    for cell in cells:
      tps = cell.getElementsByType(P)
      if len(tps)>0:
        for x in tps:
          #print x.firstChild
          print x.firstChild.encode('utf-8', 'ignore')

Upvotes: 0

Views: 1854

Answers (2)

WKPlus
WKPlus

Reputation: 7255

Maybe you are not using the latest odfpy, in the latest verion, the __str__ method of Text is implemented as:

def __str__(self):
    return self.data

Update odfpy to the latest version, and modify your code as:

print x.firstChild.__str__().encode('utf-8', 'ignore')

UPDATE

This is another method for getting the raw unicode data for Text: __unicode__. So if you don't want to update odfpy, modify your code as:

print x.firstChild.__unicode__().encode('utf-8', 'ignore')

Upvotes: 1

Anand S Kumar
Anand S Kumar

Reputation: 90999

Seems like the library itself is calling encode() -

return self.data.encode()

This uses the system default encoding , which in your case seems to be ascii. you can check that by using -

import sys
sys.getdefaultencoding()

From the traceback, seems like the actual data exists in a variable called data.

Try doing the below instead -

print x.firstChild.data

Upvotes: 0

Related Questions