Pablito
Pablito

Reputation: 97

Want to create a function with def, but ValueError returned

What I wanna do

I want to do RFM analytics for purchase data of a e-commerce site.

I processed the data into RFM format, so I want to rank every ID depending on the values of each column (Money, Recency and Frequency).

However, I got the error message as below.

 ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-e7bf5ddc856d> in <module>
     13         return 5
     14 
---> 15 rfm['money rank'] = rfm['money'].apply(money)
     16 rfm.head()

c:\users\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7766             kwds=kwds,
   7767         )
-> 7768         return op.get_result()
   7769 
   7770     def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:

c:\users\lib\site-packages\pandas\core\apply.py in get_result(self)
    183             return self.apply_raw()
    184 
--> 185         return self.apply_standard()
    186 
    187     def apply_empty_result(self):

c:\users\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    274 
    275     def apply_standard(self):
--> 276         results, res_index = self.apply_series_generator()
    277 
    278         # wrap results

c:\users\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    288             for i, v in enumerate(series_gen):
    289                 # ignore SettingWithCopy here in case the user mutates
--> 290                 results[i] = self.f(v)
    291                 if isinstance(results[i], ABCSeries):
    292                     # If we have a view on v, we need to make a copy because

<ipython-input-15-e7bf5ddc856d> in money(a)
      1 def money(a):
----> 2     if a < 1000:
      3         return 0
      4     if (1000 <= a) & (a < 2000):
      5         return 1

c:\users\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1440     @final
   1441     def __nonzero__(self):
-> 1442         raise ValueError(
   1443             f"The truth value of a {type(self).__name__} is ambiguous. "
   1444             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 

Data

```
    money    recency    frequency
sum    <lambda>    len
ID            
100    2674    169 days    1
101    19760    98 days    3
103    2674    167 days    1
109    7904    56 days    3
11    2674    211 days    1

<class 'pandas.core.frame.DataFrame'>
Index: 290 entries, 100 to 99
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype          
---  ------            --------------  -----          
 0   (money, sum)     290 non-null    int64          
 1   (recency, <lambda>)  290 non-null    timedelta64[ns]
 2   (freqency, len)   290 non-null    int64          
dtypes: int64(2), timedelta64[ns](1)
memory usage: 9.1+ KB
```

Code

```
def money(a):
    if a < 1000:
        return 0
    if (1000 <= a) & (a < 2000):
        return 1
    if (2000 <= a) & (a < 3000):
        return 2
    if (3000 <= a) & (a < 4000):
        return 3
    if (4000 <= a) & (a < 5000):
        return 4
    if a >= 5000:
        return 5

rfm['money rank'] = rfm['money'].apply(money)
```

I tried different types of (), but all of them never worked.

If you could help me out, I'd be so grateful. Thank you in advance!!!

Upvotes: 1

Views: 75

Answers (3)

jezrael
jezrael

Reputation: 862431

If working with scalars use and instead & with remove last level of MultiIndex by MultiIndex.droplevel.

So use:

def money(a):
    if a < 1000:
        return 0
    if (1000 <= a) and (a < 2000):
        return 1
    if (2000 <= a) and (a < 3000):
        return 2
    if (3000 <= a) and (a < 4000):
        return 3
    if (4000 <= a) and (a < 5000):
        return 4
    if a >= 5000:
        return 5

rfm.columns = rfm.columns.droplevel(-1)
rfm['money rank'] = rfm['money'].apply(money)

Another solution here is use cut:

rfm.columns = rfm.columns.droplevel(-1)

rfm['money rank'] = pd.cut(rfm['money'], 
                           bins=[-np.inf, 1000,2000,3000,4000,5000,np.inf], 
                           labels=[0,1,2,3,4,5],
                           right=False)

Upvotes: 1

Yefet
Yefet

Reputation: 2086

another solution and fast one is to make use of numpy searchsorted

import numpy as np

bins = np.array([1000 , 2000 , 3000 , 4000 , 5000])
rfm['money rank'] = bins.searchsorted(rfm['money']) 

Upvotes: 0

T.M15
T.M15

Reputation: 416

You can write it as:

def money(a):
    if a < 1000:
        return 0
    if 1000 <= a < 2000:
        return 1
    if 2000 <= a < 3000:
        return 2
    if 3000 <= a < 4000:
        return 3
    if 4000 <= a < 5000:
        return 4
    if a >= 5000:
        return 5

In fact, a more short logic would be:

def money(a):
    return min(5, a//1000) 

PS: Assuming money is NOT negative, above solution will works. However, in case if you're willing to pass a negative value, you can also write it as:

def money(a):
    return max(0, min(5, a//1000))

Also, since you are just passing money to .apply, you can use lambda function as:

rfm['money rank'] = rfm['money'].apply(lambda a: max(0, min(5, a//1000)))

Hope that helps!

Upvotes: 0

Related Questions