Reputation: 2044
I have a dataframe df
which is sparse and for memory efficiency I wish to convert it using to_sparse()
However it seems that the new representation ends up with the dtype=float64
, even when my df
is dtype=int8
.
Is there a way specify the data type/ prevent auto conversion to dtype=float64
when using to_sparse()
?
Upvotes: 1
Views: 155
Reputation: 2044
Looking under the hood, the Pandas
sparse frame implementation at pandas.sparse.frame
we see that the astype()
method is still waiting to be implemented as of release 0.18.0. Ref. Github
When we have some implementation in place, conversion of dtype
should work like pandas.core.frame
(Pandas DataFrame
). Given a Pandas DataFrame
df
we could convert it to SparseDataFrame
and specify dtype
df.to_sparse().astype(dtype)
ATM, SparseDataFrame
does not have much support for dtype
but it is currently being developed. Refer this issue that I opened Github.
Upvotes: 0
Reputation: 32224
You see, dtypes is not a pandas controlled entity. Dtypes is typically a numpy thing. Dtypes are not controllable in any way, they are automagically asserted by numpy and can only change when you change the data inside the dataframe or numpy array.
That being said, the typical reason for ending up with a float instead of an int as a dtype is because of the introduction of NaN values into the series or numpy array. This is a pandas gotcha some say. I personally would argue it is due to the (too) close coupling between pandas and numpy.
In general, dtypes should never be trusted for anything, they are incredibly unreliable. I think everyone working with numpy/pandas would live a better life if they were never exposed to dtypes at all.
If you really really hate floats, the only other option for you as far as I know is to use string representations, which of course causes even more problems in most cases.
Upvotes: 1