Chris
Chris

Reputation: 31206

convert pandas float series to int

I am discretizing my series for a learner. I really need the series to be in float, and I really need to avoid for loops.

How do I convert this series from float to int?

Here is my function that is currently failing:

def discretize_series(s,count,normalized=True):
    def discretize(value,bucket_size):
        return value % bucket_size
    if normalized:
        maximum = 1.0
    else:
        minimum = np.min(s)
        s = s[:] - minimum
        maximum = np.max(s)
    bucket_size = maximum / float(count)

Here is the line that causes the function to fail:

    s = int((s[:] - s[:] % bucket_size)/bucket_size)

The int() induces a casting error: I am unable to cast the pandas series as an int series.

    return s

If I remove the int(), the function works, so I may just see if I can get it to work anyway.

Upvotes: 11

Views: 59280

Answers (2)

Curious Watcher
Curious Watcher

Reputation: 689

N.B. This answer is less efficient from the point of view that pandas is built on top of numpy. Please consider numpy if going for efficiency.

As for this answer, there is a significant amount of work done using pandas data frames, so adding additional conversion to numpy means writing extra code. So if one is performing an analysis in say jupyter notebook, then we can surely let the programming language do a bit of work under the hood.

Big thank you to @Chris for noticing this.


pandas version (theoretically less efficient than numpy)

Create a list with float values:

y = [0.1234, 0.6789, 0.5678]

Convert the list of float values to pandas Series

s = pd.Series(data=y)

Round values to three decimal values

print(s.round(3))

returns

0    0.123
1    0.679
2    0.568
dtype: float64

Convert to integer

print(s.astype(int))

returns

0    0
1    0
2    0
dtype: int64

Pipe it all

pd.Series(data=y).round(3)

Upvotes: 2

The regular python int function only works for scalars. You should either use a numpy function to round the data, either

s = np.round((s - s % bucket_size) / bucket_size) #to round properly; or
s = np.fix((s - s % bucket_size) / bucket_size)   #to round towards 0

and if you actually want to convert to an integer type, use

s = s.astype(int)

to cast your array.

Upvotes: 26

Related Questions