Reputation: 5552
Given
np.random.seed(1234)
df = pd.DataFrame({'A' : range(10), 'B' : np.random.randn(10), 'C' : np.random.randn(10)})
How to round columns B, C to the nearest 0.25. This is what I tried:
def roundPartial (value, resolution):
return round (value / resolution) * resolution
df[['B', 'C']].apply(roundPartial, 0.25)
But I get:
ValueError: No axis named 0.25 for object type <class 'pandas.core.frame.DataFrame'>
Upvotes: 2
Views: 928
Reputation: 863226
If you need apply
function roundPartial
with arguments
, you can use lambda
:
def roundPartial (value, resolution):
return round (value / resolution) * resolution
print (df[['B', 'C']].apply(lambda x: roundPartial(x, 0.25)))
B C
0 0.50 1.25
1 -1.25 1.00
2 1.50 1.00
3 -0.25 -2.00
4 -0.75 -0.25
5 1.00 0.00
6 0.75 0.50
7 -0.75 0.25
8 0.00 1.25
9 -2.25 -1.50
Another solution with round
:
print (df[['B', 'C']].apply(lambda x: (x / 0.25).round()* 0.25))
B C
0 0.50 1.25
1 -1.25 1.00
2 1.50 1.00
3 -0.25 -2.00
4 -0.75 -0.25
5 1.00 0.00
6 0.75 0.50
7 -0.75 0.25
8 0.00 1.25
9 -2.25 -1.50
But the fastest in larger DataFrame
is not use apply
, you can divide by div
all DataFrame
by resolution
and multiple by mul
:
resolution = 0.25
print ((df[['B', 'C']].div(resolution)).round().mul(resolution))
#print ((df[['B', 'C']] / resolution).round() * resolution)
B C
0 0.50 1.25
1 -1.25 1.00
2 1.50 1.00
3 -0.25 -2.00
4 -0.75 -0.25
5 1.00 0.00
6 0.75 0.50
7 -0.75 0.25
8 0.00 1.25
9 -2.25 -1.50
Timings:len(df)=100k
:
df = pd.concat([df]*10000).reset_index(drop=True)
In [125]: %timeit (df[['B', 'C']].apply(lambda x: (x / resolution).round()* resolution))
10 loops, best of 3: 29 ms per loop
In [126]: %timeit ((df[['B', 'C']] / resolution).round() * resolution)
10 loops, best of 3: 22.5 ms per loop
In [127]: %timeit ((df[['B', 'C']].div(resolution)).round().mul(resolution))
10 loops, best of 3: 22.6 ms per loop
Upvotes: 2