Pandas IndexError for large dataframe

Question

When I try to add a new column to large DataFrame I get an IndexError. Does anyone can help me about this error?

>vec
                 0        1        2        3        4        5        6 
V1.UC8.0         0        0        0        0        0        0        0   
V1.UC48.0        0        0        0        0        0        0        0   

                 7        8        9         ...     2546531  2546532  2546533  
V1.UC8.0         0        0        0   ...           0        0        0   
V1.UC48.0        0        0        0   ...           0        0        0   

               2546534  2546535  2546536  2546537  2546538  2546539  2546540  
V1.UC8.0         0        0        0        0        0        0        0  
V1.UC48.0        0        0        0        0        0        0        0  

[2 rows x 2546541 columns]

> vec['ToDrop']=0


    IndexError                                Traceback (most recent call last)
 in ()
----> 1 vec['ToDrop']=0

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in __setitem__(self, key, value)
   2115         else:
   2116             # set column
-> 2117             self._set_item(key, value)
   2118 
   2119     def _setitem_slice(self, key, value):

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value)
   2193         self._ensure_valid_index(value)
   2194         value = self._sanitize_column(key, value)
-> 2195         NDFrame._set_item(self, key, value)
   2196 
   2197         # check if we are modifying a copy

C:\Anaconda\lib\site-packages\pandas\core\generic.pyc in _set_item(self, key, value)
   1188 
   1189     def _set_item(self, key, value):
-> 1190         self._data.set(key, value)
   1191         self._clear_item_cache()
   1192 

C:\Anaconda\lib\site-packages\pandas\core\internals.pyc in set(self, item, value, check)
   2970 
   2971         try:
-> 2972             loc = self.items.get_loc(item)
   2973         except KeyError:
   2974             # This item wasn't present, just insert at end

C:\Anaconda\lib\site-packages\pandas\core\index.pyc in get_loc(self, key, method)
   1435         """
   1436         if method is None:
-> 1437             return self._engine.get_loc(_values_from_object(key))
   1438 
   1439         indexer = self.get_indexer([key], method=method)

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3578)()

pandas\src\util.pxd in util.get_value_at (pandas\index.c:15287)()

IndexError: index out of bounds

I have been trying to add a new row to transposed DataFrame (vec.T) but got the same error.

ldirer · Accepted Answer

This is very weird indeed.

You can use something like this as a workaround:

vec = pd.merge(vec, pd.DataFrame([0, 0], columns=["new"]), right_index=True, left_index=True)  # Optional: pass copy=False

Make sure your new 1-column dataframe has the same index as vec.

More on why this is weird:

Hopefully someone can provide a proper answer.

df = pd.DataFrame(np.zeros((2, 2546540)))
df[2546540] = 0

Output: IndexError as in the OP's post.

df["blah"] = 0

Output:

TypeError: unorderable types: numpy.ndarray() < str()

Meanwhile, everything's fine with a small dataframe:

df = pd.DataFrame(np.zeros((2, 200)))
df[200] = 0

Output exactly as expected:

   0    1    2    3    4    5    6    7    8    9   ...   191  192  193  194  0    0    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   
1    0    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   

   195  196  197  198  199  200  
0    0    0    0    0    0    0  
1    0    0    0    0    0    0  

[2 rows x 201 columns]

Hope this helps and someone can explain this behavior of pandas.

Pandas IndexError for large dataframe

Answers (1)

Related Questions