codec can't decode byte (and the solutions I've seen to this error haven't helped)

Question

I'm trying to pull units out of a data file to use in its post processing. The file is a .csv and after struggling with pandas, I've resorted to using pandas for the channel names and skipping the 2 rows after (units, and "Raw") and the data itself.

I'm separately using np.genfromtxt to extract the units:

def get_df(f):
    df = pd.read_csv(os.path.join(pathname, f), skiprows=[0, 1, 2, 3, 4, 6, 7])
    units = np.genfromtxt(os.path.join(pathname, f), skip_header = 6, delimiter = ',', max_rows = 1, dtype = np.string_)

    return df, units

And, since some of these units contain '/', I'm changing them (these values end up being joined to the names of the channels and used in file names for the plots generated).

df, units = get_df(f)

unit_dict = {}
for column, unit in zip(df.columns, units):
    unit = string.replace(unit, '/', ' per ')
    unit_dict[column] = unit

When I get to a channel name that has a degree symbol in it, I get an error:

CellAmbTemp �C
Traceback (most recent call last):
  File "filepath_omitted/Processing.py", line 112, in  df_average[column], column)
  File "path/Processing.py", line 30, in contour_plot
plt.title(column_name)
  File "C:\Python27\lib\site-packages\matplotlib\pyplot.py", line 1465, in title
return gca().set_title(s, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axes\_axes.py", line 186, in set_title title.set_text(label)
  File "C:\Python27\lib\site-packages\matplotlib	ext.py", line 1212, in set_text
self._text = '%s' % (s,)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 12: 
ordinal not in range(128)

Process finished with exit code 1

I printed out the dictionary in which I'm pairing channels with the units and in this case, the entry looks like:

'CellAmbTemp': '\xb0C'

What encoding is that?
I've tried various things like string.decode() and unicode(string) and dtype = unicode_
Is there a better way to do what I need to do? Or at least cobble something together to fix it?

Added: chunk of the file

Logger description:                                     
Log period: 1 s                                 
Statistics period: 30 s                                 
Statistics window: 300 s                                    
Maximum duration:                                   
Time    Time    Time    ActSpeed    ActTorque   ActPower    FuelMassFlowRate    BarometricPress CellAmbTemp ChargeCoolerInPressG
Date    Time    ms  rev/min Nm  kW  g/h kPa °C  kPa
Raw Raw Raw Raw Raw Raw Raw Raw Raw Raw
1/12/2018   12:30:01 PM 153.4   600.0856308 132.4150085 7.813595703 2116.299996 97.76997785 11.29989827 0.294584802
1/12/2018   12:30:02 PM 153.4   600.1700702 132.7327271 7.989128906 2271.800016 97.76997785 11.29989827 0.336668345
1/12/2018   12:30:03 PM 153.4   600.0262537 128.7541351 7.427545898 2783.199996 97.78462672 11.29989827 0.241980373

ETA:

I ended up switching how I acquired the units to pandas:

def get_df(f):
    df = pd.read_csv(os.path.join(pathname, f), skiprows=[0, 1, 2, 3, 4, 6, 7])
    units = pd.read_csv(os.path.join(pathname, f), skiprows = 6, delimiter = ',')
    units = units.columns
    return df, units

Then I decoded / encoded outside:

df, units = get_df(f)

unit_dict = {}
for column, unit in zip(df.columns, units):
    encoding = chardet.detect(unit)['encoding']
    unit = unit.decode(str(encoding)).encode('utf-8')
    unit_dict[column] = unit

Now I'm getting the error when I'm trying to use that text as the title of a plot in matplotlib, but I'm getting farther into the code before the error.

progmatico · Accepted Answer

You have to know the encoding of your input file (or just try the common utf-8). If you don't, and utf-8 does not work, try using chardet on the file and use its result.

codec can't decode byte (and the solutions I've seen to this error haven't helped)

Answers (2)

Related Questions

codec can&#39;t decode byte (and the solutions I&#39;ve seen to this error haven&#39;t helped)

Answers (2)

Related Questions

codec can't decode byte (and the solutions I've seen to this error haven't helped)