Reputation: 191
I am trying to save a dataframe as CSV and get a "too many indices for array" error. The code used for the save is-
df.to_csv('CCS_Matrix.csv')
The dataframe looks like this
Var10 Var100 Var101
0 0 1 1
1 0 0 1
2 0 1 0
There are 250 columns and about 10 million rows in the dataset.
The dtypes for the dataframe are
Var10 int64
Var100 int64
Var101 int64
etc.
All the dtypes are the same for the 250 columns.
Here is the full output of the error message
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-16-37cbe55e6c0d> in <module>()
----> 1 df.to_csv('CCS_Matrix.csv', encoding='utf-8')
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
1401 doublequote=doublequote,
1402 escapechar=escapechar, decimal=decimal)
-> 1403 formatter.save()
1404
1405 if path_or_buf is None:
~/anaconda3/lib/python3.6/site-packages/pandas/io/formats/format.py in save(self)
1590 self.writer = csv.writer(f, **writer_kwargs)
1591
-> 1592 self._save()
1593
1594 finally:
~/anaconda3/lib/python3.6/site-packages/pandas/io/formats/format.py in _save(self)
1691 break
1692
-> 1693 self._save_chunk(start_i, end_i)
1694
1695 def _save_chunk(self, start_i, end_i):
~/anaconda3/lib/python3.6/site-packages/pandas/io/formats/format.py in _save_chunk(self, start_i, end_i)
1705 decimal=self.decimal,
1706 date_format=self.date_format,
-> 1707 quoting=self.quoting)
1708
1709 for col_loc, col in zip(b.mgr_locs, d):
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in to_native_types(self, slicer, na_rep, quoting, **kwargs)
611 values = self.values
612 if slicer is not None:
--> 613 values = values[:, slicer]
614 mask = isnull(values)
615
~/anaconda3/lib/python3.6/site-packages/pandas/core/sparse/array.py in __getitem__(self, key)
417 return self._get_val_at(key)
418 elif isinstance(key, tuple):
--> 419 data_slice = self.values[key]
420 else:
421 if isinstance(key, SparseArray):
IndexError: too many indices for array
Upvotes: 1
Views: 829
Reputation: 96
Could you print out type(df)? I've noted this problem in SparseDataFrames here.
I was able to solve the problem by calling .to_dense() on the SparseDataFrame, yielding a traditional DataFrame. Worked fine after that. Clearly that's not ideal for memory reasons, but at least it works in the short term.
The pandas team has responded that it is indeed a bug.
Upvotes: 4
Reputation: 1660
you can try another option to save as csv '.toCSV('name.csv)'. That can give you a different error message like ('SparseDataFrame' object has no attribute 'toCSV') So the problem was solved by turning dataframe to dense dataframe
df.to_dense().to_csv("submission.csv", index = False, sep=',', encoding='utf-8')
Upvotes: 1