Reputation: 49
When I try to add a new column to large DataFrame I get an IndexError. Does anyone can help me about this error?
>vec
0 1 2 3 4 5 6
V1.UC8.0 0 0 0 0 0 0 0
V1.UC48.0 0 0 0 0 0 0 0
7 8 9 ... 2546531 2546532 2546533
V1.UC8.0 0 0 0 ... 0 0 0
V1.UC48.0 0 0 0 ... 0 0 0
2546534 2546535 2546536 2546537 2546538 2546539 2546540
V1.UC8.0 0 0 0 0 0 0 0
V1.UC48.0 0 0 0 0 0 0 0
[2 rows x 2546541 columns]
> vec['ToDrop']=0
IndexError Traceback (most recent call last)
<ipython-input-40-9868611037ed> in <module>()
----> 1 vec['ToDrop']=0
C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in __setitem__(self, key, value)
2115 else:
2116 # set column
-> 2117 self._set_item(key, value)
2118
2119 def _setitem_slice(self, key, value):
C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value)
2193 self._ensure_valid_index(value)
2194 value = self._sanitize_column(key, value)
-> 2195 NDFrame._set_item(self, key, value)
2196
2197 # check if we are modifying a copy
C:\Anaconda\lib\site-packages\pandas\core\generic.pyc in _set_item(self, key, value)
1188
1189 def _set_item(self, key, value):
-> 1190 self._data.set(key, value)
1191 self._clear_item_cache()
1192
C:\Anaconda\lib\site-packages\pandas\core\internals.pyc in set(self, item, value, check)
2970
2971 try:
-> 2972 loc = self.items.get_loc(item)
2973 except KeyError:
2974 # This item wasn't present, just insert at end
C:\Anaconda\lib\site-packages\pandas\core\index.pyc in get_loc(self, key, method)
1435 """
1436 if method is None:
-> 1437 return self._engine.get_loc(_values_from_object(key))
1438
1439 indexer = self.get_indexer([key], method=method)
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3578)()
pandas\src\util.pxd in util.get_value_at (pandas\index.c:15287)()
IndexError: index out of bounds
I have been trying to add a new row to transposed DataFrame (vec.T) but got the same error.
Upvotes: 2
Views: 1707
Reputation: 6756
This is very weird indeed.
You can use something like this as a workaround:
vec = pd.merge(vec, pd.DataFrame([0, 0], columns=["new"]), right_index=True, left_index=True) # Optional: pass copy=False
Make sure your new 1-column dataframe has the same index as vec
.
More on why this is weird:
Hopefully someone can provide a proper answer.
df = pd.DataFrame(np.zeros((2, 2546540)))
df[2546540] = 0
Output: IndexError
as in the OP's post.
df["blah"] = 0
Output:
TypeError: unorderable types: numpy.ndarray() < str()
Meanwhile, everything's fine with a small dataframe:
df = pd.DataFrame(np.zeros((2, 200)))
df[200] = 0
Output exactly as expected:
0 1 2 3 4 5 6 7 8 9 ... 191 192 193 194 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0
195 196 197 198 199 200
0 0 0 0 0 0 0
1 0 0 0 0 0 0
[2 rows x 201 columns]
Hope this helps and someone can explain this behavior of pandas.
Upvotes: 3