Reputation: 1355
I am doing a transformation on a variable from a pandas dataframe and then I would like to replace the column with my new values. The problem seems to be that after the transformation, the length of the array is not the same as the length of my dataframe's index. I don't think that is true though.
>>> df['variable'] = stats.boxcox(df.variable)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\eMachine\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.py", line 2119, in __setitem__
self._set_item(key, value)
File "C:\Users\eMachine\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.py", line 2165, in _set_item
value = self._sanitize_column(key, value)
File "C:\Users\eMachine\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.py", line 2205, in _sanitize_column
raise AssertionError('Length of values does not match '
AssertionError: Length of values does not match length of index
When I check the length, these lengths seem to disagree. The len(array) says it is 2 but when I call the stats.boxcox it says it is 50000. What is going on here?
>>> len(df)
50000
>>> len(stats.boxcox(df.variable))
2
>>> stats.boxcox(df.variable)
(0 -0.079496
1 -0.117982
2 -0.104637
...
49985 -0.041300
49986 0.651771
49987 -0.115660
49988 -0.118034
49998 -0.118014
49999 -0.034076
Name: feat9, Length: 50000, dtype: float64, 8.4721358117221772)
>>>
Upvotes: 3
Views: 4602
Reputation: 251365
You can see in your example that the result of boxcox
is a tuple. This is consistent with the documentation, which indicates that boxcox
returns a tuple of the transformed data and a lambda value. Notice in the example on that page that it does:
xt, _ = stats.boxcox(x)
. . . showing again that boxcox
returns a 2-tuple.
You should be doing df['variable'] = stats.boxcox(df.variable)[0]
.
Upvotes: 11