Reputation: 31206
I am discretizing my series for a learner. I really need the series to be in float, and I really need to avoid for loops.
How do I convert this series from float to int?
Here is my function that is currently failing:
def discretize_series(s,count,normalized=True):
def discretize(value,bucket_size):
return value % bucket_size
if normalized:
maximum = 1.0
else:
minimum = np.min(s)
s = s[:] - minimum
maximum = np.max(s)
bucket_size = maximum / float(count)
Here is the line that causes the function to fail:
s = int((s[:] - s[:] % bucket_size)/bucket_size)
The int() induces a casting error: I am unable to cast the pandas series as an int series.
return s
If I remove the int(), the function works, so I may just see if I can get it to work anyway.
Upvotes: 11
Views: 59280
Reputation: 689
N.B. This answer is less efficient from the point of view that pandas
is built on top of numpy
. Please consider numpy
if going for efficiency.
As for this answer, there is a significant amount of work done using pandas
data frames, so adding additional conversion to numpy
means writing extra code. So if one is performing an analysis in say jupyter notebook
, then we can surely let the programming language do a bit of work under the hood.
Big thank you to @Chris for noticing this.
pandas
version (theoretically less efficient than numpy
)y = [0.1234, 0.6789, 0.5678]
pandas
Seriess = pd.Series(data=y)
print(s.round(3))
returns
0 0.123
1 0.679
2 0.568
dtype: float64
print(s.astype(int))
returns
0 0
1 0
2 0
dtype: int64
pd.Series(data=y).round(3)
Upvotes: 2
Reputation: 35109
The regular python int
function only works for scalars. You should either use a numpy function to round the data, either
s = np.round((s - s % bucket_size) / bucket_size) #to round properly; or
s = np.fix((s - s % bucket_size) / bucket_size) #to round towards 0
and if you actually want to convert to an integer type, use
s = s.astype(int)
to cast your array.
Upvotes: 26