RROBINSON
RROBINSON

Reputation: 191

Python - Save dataframe to CSV "too many indices for array" error

I am trying to save a dataframe as CSV and get a "too many indices for array" error. The code used for the save is-

df.to_csv('CCS_Matrix.csv')

The dataframe looks like this

  Var10  Var100   Var101    
0   0       1       1
1   0       0       1
2   0       1       0

There are 250 columns and about 10 million rows in the dataset.

The dtypes for the dataframe are

Var10     int64
Var100    int64
Var101    int64
etc.

All the dtypes are the same for the 250 columns.

Here is the full output of the error message

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-16-37cbe55e6c0d> in <module>()
----> 1 df.to_csv('CCS_Matrix.csv', encoding='utf-8')

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
   1401                                      doublequote=doublequote,
   1402                                      escapechar=escapechar, decimal=decimal)
-> 1403         formatter.save()
   1404 
   1405         if path_or_buf is None:

~/anaconda3/lib/python3.6/site-packages/pandas/io/formats/format.py in save(self)
   1590                 self.writer = csv.writer(f, **writer_kwargs)
   1591 
-> 1592             self._save()
   1593 
   1594         finally:

~/anaconda3/lib/python3.6/site-packages/pandas/io/formats/format.py in _save(self)
   1691                 break
   1692 
-> 1693             self._save_chunk(start_i, end_i)
   1694 
   1695     def _save_chunk(self, start_i, end_i):

~/anaconda3/lib/python3.6/site-packages/pandas/io/formats/format.py in _save_chunk(self, start_i, end_i)
   1705                                   decimal=self.decimal,
   1706                                   date_format=self.date_format,
-> 1707                                   quoting=self.quoting)
   1708 
   1709             for col_loc, col in zip(b.mgr_locs, d):

~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in to_native_types(self, slicer, na_rep, quoting, **kwargs)
    611         values = self.values
    612         if slicer is not None:
--> 613             values = values[:, slicer]
    614         mask = isnull(values)
    615 

~/anaconda3/lib/python3.6/site-packages/pandas/core/sparse/array.py in __getitem__(self, key)
    417             return self._get_val_at(key)
    418         elif isinstance(key, tuple):
--> 419             data_slice = self.values[key]
    420         else:
    421             if isinstance(key, SparseArray):

IndexError: too many indices for array

Upvotes: 1

Views: 829

Answers (2)

Hannah Lindsley
Hannah Lindsley

Reputation: 96

Could you print out type(df)? I've noted this problem in SparseDataFrames here.

I was able to solve the problem by calling .to_dense() on the SparseDataFrame, yielding a traditional DataFrame. Worked fine after that. Clearly that's not ideal for memory reasons, but at least it works in the short term.

The pandas team has responded that it is indeed a bug.

Upvotes: 4

Yury Wallet
Yury Wallet

Reputation: 1660

you can try another option to save as csv '.toCSV('name.csv)'. That can give you a different error message like ('SparseDataFrame' object has no attribute 'toCSV') So the problem was solved by turning dataframe to dense dataframe

df.to_dense().to_csv("submission.csv", index = False, sep=',', encoding='utf-8')

Upvotes: 1

Related Questions