Wessi
Wessi

Reputation: 1822

Rounding up one column in pandas dataframe

I have a pandas dataframe df that looks like this:

          no_obs  price_cleaning  house_size
0         1             585          30
1         1             585          40
2         1             585          43
3         1             650          43
4         1             633          44
5         1             650          45
6         2             585          50
7         1             633          50
8         1             650          50
9         2             750          50 

I want to round up the values in the price_cleaning column with this function:

def roundup(x): return int(math.ceil(x / 10.0)) * 10

I have tried the solution from this answer (Applying function to Pandas dataframe by column):

cols = [col for col in df.columns if col != 'price_cleaning'] df[cols] = df[cols].apply(roundup)

I get the following error: TypeError: ("cannot convert the series to ", 'occurred at index no_obs')

Can anyone help me understand why this is not working? How do I apply the roundup function to the column? Any help is much appreciated.

Upvotes: 2

Views: 7526

Answers (4)

Zero
Zero

Reputation: 77027

I'd vectorize like

In [298]: df['p'] = (np.ceil(df.price_cleaning / 10) * 10).astype(int)

In [299]: df
Out[299]:
   no_obs  price_cleaning  house_size    p
0       1             585          30  590
1       1             585          40  590
2       1             585          43  590
3       1             650          43  650
4       1             633          44  640
5       1             650          45  650
6       2             585          50  590
7       1             633          50  640
8       1             650          50  650
9       2             750          50  750

For 10K rows, timings - vectorized method is ~15x times faster to apply

In [331]: %timeit (np.ceil(dff.price_cleaning / 10) * 10).astype(int)
1000 loops, best of 3: 436 µs per loop

In [332]: %timeit dff['price_cleaning'].apply(roundup)
100 loops, best of 3: 7.86 ms per loop

In [333]: dff.shape
Out[333]: (10000, 4)

Atleast in this case, performance gap, with more rows, will increase.

Upvotes: 7

Zeugma
Zeugma

Reputation: 32125

You are filtering the columns upside down, do this instead:

cols = [col for col in  df.columns if col == 'price_cleaning']

Now, if you need to cleanup only one columns, then no need to create cols. Just do:

df['price_cleaning'] = df['price_cleaning'].apply(roundup)

Upvotes: 4

Fabio Lamanna
Fabio Lamanna

Reputation: 21584

I think you can use apply and lambda as:

In [6]: df['p'] = df['price_cleaning'].apply(lambda x: int(math.ceil(x / 10.0)) * 10)

In [7]: df
Out[7]: 
   no_obs  price_cleaning  house_size    p
0       1             585          30  590
1       1             585          40  590
2       1             585          43  590
3       1             650          43  650
4       1             633          44  640
5       1             650          45  650
6       2             585          50  590
7       1             633          50  640
8       1             650          50  650
9       2             750          50  750

Upvotes: 2

erasmortg
erasmortg

Reputation: 3278

This might work:

>>> df['price_cleaning_ceiling']= df.price_cleaning.apply(lambda x: int(math.ceil(x / 10.0)) * 10)

Upvotes: 3

Related Questions