Reputation: 1379
I am working on a following type of data.
itemid category subcategory title 1 10000010 Транспорт Автомобили с пробегом Toyota Sera, 1991 2 10000025 Услуги Предложения услуг Монтаж кровли 3 10000094 Личные вещи Одежда, обувь, аксессуары Костюм Steilmann 4 10000101 Транспорт Автомобили с пробегом Ford Focus, 2011 5 10000132 Транспорт Запчасти и аксессуары Турбина 3.0 Bar 6 10000152 Транспорт Автомобили с пробегом ВАЗ 2115 Samara, 2005
Now I run the following commands
import pandas as pd trainingData = pd.read_table("train.tsv",nrows=10, header=0,encoding='utf-8') trainingData['itemid'].head() 0 10000010 1 10000025 2 10000094 3 10000101 4 10000132 Name: itemid
Everything is good this point but when I do something like
trainingData['itemid','category'].head() Error: --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) /home/vikram/Documents/Avito/ in () ----> 1 trainingData[['itemid','category']].head() /usr/lib/python2.7/dist-packages/IPython/core/displayhook.pyc in __call__(self, result) 236 self.start_displayhook() 237 self.write_output_prompt() --> 238 format_dict = self.compute_format_data(result) 239 self.write_format_data(format_dict) 240 self.update_user_ns(result) /usr/lib/python2.7/dist-packages/IPython/core/displayhook.pyc in compute_format_data(self, result) 148 MIME type representation of the object. 149 """ --> 150 return self.shell.display_formatter.format(result) 151 152 def write_format_data(self, format_dict): /usr/lib/python2.7/dist-packages/IPython/core/formatters.pyc in format(self, obj, include, exclude) 124 continue 125 try: --> 126 data = formatter(obj) 127 except: 128 # FIXME: log the exception /usr/lib/python2.7/dist-packages/IPython/core/formatters.pyc in __call__(self, obj) 445 type_pprinters=self.type_printers, 446 deferred_pprinters=self.deferred_printers) --> 447 printer.pretty(obj) 448 printer.flush() 449 return stream.getvalue() /usr/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in pretty(self, obj) 352 if callable(obj_class._repr_pretty_): 353 return obj_class._repr_pretty_(obj, self, cycle) --> 354 return _default_pprint(obj, self, cycle) 355 finally: 356 self.end_group() /usr/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle) 472 if getattr(klass, '__repr__', None) not in _baseclass_reprs: 473 # A user-provided repr. --> 474 p.text(repr(obj)) 475 return 476 p.begin_group(1, ' 456 self.to_string(buf=buf) 457 value = buf.getvalue() 458 if max([len(l) for l in value.split('\n')]) > terminal_width: /usr/lib/pymodules/python2.7/pandas/core/frame.pyc in to_string(self, buf, columns, col_space, colSpace, header, index, na_rep, formatters, float_format, sparsify, nanRep, index_names, justify, force_unicode) 1024 index_names=index_names, 1025 header=header, index=index) -> 1026 formatter.to_string(force_unicode=force_unicode) 1027 1028 if buf is None: /usr/lib/pymodules/python2.7/pandas/core/format.pyc in to_string(self, force_unicode) 176 for i, c in enumerate(self.columns): 177 if self.header: --> 178 fmt_values = self._format_col(c) 179 cheader = str_columns[i] 180 max_len = max(max(len(x) for x in fmt_values), /usr/lib/pymodules/python2.7/pandas/core/format.pyc in _format_col(self, col) 217 float_format=self.float_format, 218 na_rep=self.na_rep, --> 219 space=self.col_space) 220 221 def to_html(self): /usr/lib/pymodules/python2.7/pandas/core/format.pyc in format_array(values, formatter, float_format, na_rep, digits, space, justify) 424 justify=justify) 425 --> 426 return fmt_obj.get_result() 427 428 /usr/lib/pymodules/python2.7/pandas/core/format.pyc in get_result(self) 471 fmt_values.append(float_format(v)) 472 else: --> 473 fmt_values.append(' %s' % _format(v)) 474 475 return _make_fixed_width(fmt_values, self.justify) /usr/lib/pymodules/python2.7/pandas/core/format.pyc in _format(x) 457 else: 458 # object dtype --> 459 return '%s' % formatter(x) 460 461 vals = self.values /usr/lib/pymodules/python2.7/pandas/core/common.pyc in _stringify(col) 503 def _stringify(col): 504 # unicode workaround --> 505 return unicode(col) 506 507 def _maybe_make_list(obj): UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)
please help me "display" the data properly.
Upvotes: 1
Views: 2327
Reputation: 306
I had the same issue caused by IPython, which could not display non-ASCII text returned by the Pandas head()
function. It turned out that the default encoding for Python was set to 'ascii'
on my machine. You can check this with
import sys
sys.getdefaultencoding()
The solution was to re-set the default encoding to UTF-8:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
After this, IPython displayed Pandas data frames with non-ASCII characters correctly.
Note that the reload
call is necessary to make the setdefaultencoding
function available. Without it you'll get the error:
AttributeError: 'module' object has no attribute 'setdefaultencoding'
Upvotes: 6