
Reputation: 463

Python: normalizing some of the columns of a pandas DataFrame

I have a DataFrame from which I want to normalize some arbitrary columns using another arbitrary column:

import itertools as it
import numpy as np
import pandas as pd

header = tuple(['h_seqNum', 'h_stamp', 'user_id'])
joints = tuple(['head', 'neck', 'torso'])
attribs = tuple(['pos_x','pos_y','pos_z'])

all_columns = it.izip(*it.product(joints, attribs))
multiind_first = list(it.chain(['header']*len(header),, ['pose',]))
multiind_second = list(it.chain(header,, ['pose',]))

df = pd.DataFrame(np.random.rand(65).reshape(5,13),  columns = pd.MultiIndex.from_arrays([multiind_first, multiind_second], names=['joint', 'attrib']))

The resulting DataFrame is something like this one:

joint    header                            head                       neck                       torso                      pose
attrib   h_seqNum    h_stamp    user_id    pos_x    pos_y    pos_z    pos_x    pos_y    pos_z    pos_x    pos_y    pos_z    pose
0        0.681       0.059      0.607      0.093    0.504    0.975    0.317    0.739    0.129    0.759    0.254    0.814    1
1        0.914       0.420      0.305      0.242    0.700    0.180    0.324    0.171    0.477    0.943    0.877    0.069    0
2        0.522       0.395      0.118      0.739    0.653    0.326    0.947    0.517    0.036    0.647    0.079    0.227    0
3        0.475       0.815      0.792      0.208    0.472    0.427    0.213    0.544    0.440    0.033    0.636    0.527    2
4        0.767       0.774      0.983      0.646    0.949    0.947    0.402    0.015    0.913    0.734    0.192    0.032    0    

I want to normalize all the columns (attrib) belonging to an arbitrary joint (eg. 'head') using another arbitrary joint (eg. 'torso'). For instance something like.

df['head'] = df['head'] - df['torso']
df['neck'] = df['neck'] - df['torso']
# Note that torso remains "unnormalized"

To do so I wrote a function:

def normalize_joints(df, from_joint):
    joint_names = set(joints) - set([from_joint,])
    for j in list(joint_names):
         df[j] = df[j] - df[norm_name]

However, when I execute this function I get the following error:

normalize_joints(df, 'torso')

AttributeError                            Traceback (most recent call last)
<ipython-input-414-47f39f04716d> in <module>()
----> 1 normalize_joints(df, 'torso')

<ipython-input-407-cf13a67fabd8> in normalize_joints(df, from_joint)
      2     joint_names = set(joints) - set([from_joint,])
      3     for j in list(joint_names):
----> 4         df[j] = df[j] - df[from_joint]

/Library/Python/2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   2117                                          fill_value, limit, takeable=takeable)
-> 2119         return frame
   2121     def _reindex_index(self, new_index, method, copy, level, fill_value=NA,

/Library/Python/2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
   2164     @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs)
   2165     def reindex_axis(self, labels, axis=0, method=None, level=None, copy=True,
-> 2166                      limit=None, fill_value=np.nan):
   2167         return super(DataFrame, self).reindex_axis(labels=labels, axis=axis,
   2168                                                    method=method, level=level,

/Library/Python/2.7/site-packages/pandas/core/generic.pyc in _set_item(self, key, value)
    678     __bool__ = __nonzero__
--> 679 
    680     def bool(self):
    681         """ Return the bool of a single element PandasObject

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in set(self, item, value)
   1768     def sp_index(self):
   1769         return self.values.sp_index
-> 1770 
   1771     @property
   1772     def kind(self):

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in _reset_ref_locs(self)
   1054         # see if we can align other
   1055         if hasattr(other, 'reindex_axis'):
-> 1056             if align:
   1057                 axis = getattr(other, '_info_axis_number', 0)
   1058                 other = other.reindex_axis(self.items, axis=axis,

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in _rebuild_ref_locs(self)
   1063         # make sure that we can broadcast
-> 1064         is_transposed = False
   1065         if hasattr(other, 'ndim') and hasattr(values, 'ndim'):
   1066             if values.ndim != other.ndim or values.shape == other.shape[::-1]:

AttributeError: _ref_locs

After several tries I have not been able to locate the source of my error. If I perform the operation

df['head'] - df['torso']

it returns me a DataFrame with the correct result. However, when I try to assign this DataFrame to df['head'] I get the error shown before.

Is it any way to perform this assignment?

Moreover, I was wondering if there are any better ways to perform the same normalization than the one I am trying. Perhaps using groupby and then and applying the normalize function to the selected DataFrame?


This error occurred with numpy 1.6 and pandas 0.12

After upgrading to numpy 1.8 and pandas 0.13 the following operation is valid:

df['head'] = df['head'] - df['torso']

Upvotes: 4

Views: 3108

Answers (2)


Reputation: 463

I believe that I have found a rather simple solution:

def normalize(df, from_joint):
    df.drop(['header', 'pose', from_joint], axis=1, level='joint').sub(df[from_joint], level=1)

df.update(normalize(df, 'torso'))

Upvotes: 2

Alvaro Fuentes
Alvaro Fuentes

Reputation: 17485

The problem is that your columns are instances of MultiIndex try this:

def normalize_joints(df, from_joint):
    joint_names = set(joints) - set([from_joint,])
    for j in list(joint_names):
        keys = [(j,c) for c in attribs]
        df[keys] = df[j] - df[from_joint]

print df
normalize_joints(df, 'torso')
print df


joint     header                          head                          neck                         torso                          pose
attrib  h_seqNum   h_stamp   user_id     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z      pose
0       0.067366  0.957394  0.983969  0.602662  0.505270  0.990675  0.753841  0.598397  0.846479  0.757155  0.220009  0.328470  0.686525
1       0.806405  0.800388  0.302178  0.935559  0.180360  0.322767  0.230457  0.617555  0.602589  0.109482  0.181803  0.311266  0.929481
2       0.649677  0.237286  0.963088  0.370463  0.471590  0.489256  0.060383  0.070885  0.858312  0.306232  0.511731  0.257015  0.283287
3       0.054800  0.127925  0.099985  0.700160  0.211256  0.026782  0.820380  0.922593  0.600130  0.100745  0.418157  0.869735  0.597275
4       0.678372  0.334520  0.247894  0.616133  0.914610  0.229628  0.317488  0.224910  0.620222  0.952499  0.946568  0.539502  0.838473
joint     header                          head                          neck                         torso                          pose
attrib  h_seqNum   h_stamp   user_id     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z      pose
0       0.067366  0.957394  0.983969 -0.154493  0.285261  0.662205 -0.003314  0.378387  0.518009  0.757155  0.220009  0.328470  0.686525
1       0.806405  0.800388  0.302178  0.826077 -0.001443  0.011501  0.120975  0.435752  0.291322  0.109482  0.181803  0.311266  0.929481
2       0.649677  0.237286  0.963088  0.064231 -0.040141  0.232241 -0.245850 -0.440846  0.601297  0.306232  0.511731  0.257015  0.283287
3       0.054800  0.127925  0.099985  0.599414 -0.206900 -0.842953  0.719635  0.504436 -0.269605  0.100745  0.418157  0.869735  0.597275
4       0.678372  0.334520  0.247894 -0.336366 -0.031958 -0.309874 -0.635011 -0.721658  0.080719  0.952499  0.946568  0.539502  0.838473

Upvotes: 2

Related Questions