Reputation: 1822
I have a pandas dataframe df
that looks like this:
no_obs price_cleaning house_size
0 1 585 30
1 1 585 40
2 1 585 43
3 1 650 43
4 1 633 44
5 1 650 45
6 2 585 50
7 1 633 50
8 1 650 50
9 2 750 50
I want to round up the values in the price_cleaning
column with this function:
def roundup(x):
return int(math.ceil(x / 10.0)) * 10
I have tried the solution from this answer (Applying function to Pandas dataframe by column):
cols = [col for col in df.columns if col != 'price_cleaning']
df[cols] = df[cols].apply(roundup)
I get the following error: TypeError: ("cannot convert the series to ", 'occurred at index no_obs')
Can anyone help me understand why this is not working? How do I apply the roundup function to the column? Any help is much appreciated.
Upvotes: 2
Views: 7526
Reputation: 77027
I'd vectorize like
In [298]: df['p'] = (np.ceil(df.price_cleaning / 10) * 10).astype(int)
In [299]: df
Out[299]:
no_obs price_cleaning house_size p
0 1 585 30 590
1 1 585 40 590
2 1 585 43 590
3 1 650 43 650
4 1 633 44 640
5 1 650 45 650
6 2 585 50 590
7 1 633 50 640
8 1 650 50 650
9 2 750 50 750
For 10K rows, timings - vectorized method is ~15x times faster to apply
In [331]: %timeit (np.ceil(dff.price_cleaning / 10) * 10).astype(int)
1000 loops, best of 3: 436 µs per loop
In [332]: %timeit dff['price_cleaning'].apply(roundup)
100 loops, best of 3: 7.86 ms per loop
In [333]: dff.shape
Out[333]: (10000, 4)
Atleast in this case, performance gap, with more rows, will increase.
Upvotes: 7
Reputation: 32125
You are filtering the columns upside down, do this instead:
cols = [col for col in df.columns if col == 'price_cleaning']
Now, if you need to cleanup only one columns, then no need to create cols
. Just do:
df['price_cleaning'] = df['price_cleaning'].apply(roundup)
Upvotes: 4
Reputation: 21584
I think you can use apply
and lambda
as:
In [6]: df['p'] = df['price_cleaning'].apply(lambda x: int(math.ceil(x / 10.0)) * 10)
In [7]: df
Out[7]:
no_obs price_cleaning house_size p
0 1 585 30 590
1 1 585 40 590
2 1 585 43 590
3 1 650 43 650
4 1 633 44 640
5 1 650 45 650
6 2 585 50 590
7 1 633 50 640
8 1 650 50 650
9 2 750 50 750
Upvotes: 2
Reputation: 3278
This might work:
>>> df['price_cleaning_ceiling']= df.price_cleaning.apply(lambda x: int(math.ceil(x / 10.0)) * 10)
Upvotes: 3