Reputation: 97
What I wanna do
I want to do RFM analytics for purchase data of a e-commerce site.
I processed the data into RFM format, so I want to rank every ID depending on the values of each column (Money, Recency and Frequency).
However, I got the error message as below.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-e7bf5ddc856d> in <module>
13 return 5
14
---> 15 rfm['money rank'] = rfm['money'].apply(money)
16 rfm.head()
c:\users\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7766 kwds=kwds,
7767 )
-> 7768 return op.get_result()
7769
7770 def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
c:\users\lib\site-packages\pandas\core\apply.py in get_result(self)
183 return self.apply_raw()
184
--> 185 return self.apply_standard()
186
187 def apply_empty_result(self):
c:\users\lib\site-packages\pandas\core\apply.py in apply_standard(self)
274
275 def apply_standard(self):
--> 276 results, res_index = self.apply_series_generator()
277
278 # wrap results
c:\users\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
288 for i, v in enumerate(series_gen):
289 # ignore SettingWithCopy here in case the user mutates
--> 290 results[i] = self.f(v)
291 if isinstance(results[i], ABCSeries):
292 # If we have a view on v, we need to make a copy because
<ipython-input-15-e7bf5ddc856d> in money(a)
1 def money(a):
----> 2 if a < 1000:
3 return 0
4 if (1000 <= a) & (a < 2000):
5 return 1
c:\users\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1440 @final
1441 def __nonzero__(self):
-> 1442 raise ValueError(
1443 f"The truth value of a {type(self).__name__} is ambiguous. "
1444 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Data
```
money recency frequency
sum <lambda> len
ID
100 2674 169 days 1
101 19760 98 days 3
103 2674 167 days 1
109 7904 56 days 3
11 2674 211 days 1
<class 'pandas.core.frame.DataFrame'>
Index: 290 entries, 100 to 99
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 (money, sum) 290 non-null int64
1 (recency, <lambda>) 290 non-null timedelta64[ns]
2 (freqency, len) 290 non-null int64
dtypes: int64(2), timedelta64[ns](1)
memory usage: 9.1+ KB
```
Code
```
def money(a):
if a < 1000:
return 0
if (1000 <= a) & (a < 2000):
return 1
if (2000 <= a) & (a < 3000):
return 2
if (3000 <= a) & (a < 4000):
return 3
if (4000 <= a) & (a < 5000):
return 4
if a >= 5000:
return 5
rfm['money rank'] = rfm['money'].apply(money)
```
I tried different types of (), but all of them never worked.
If you could help me out, I'd be so grateful. Thank you in advance!!!
Upvotes: 1
Views: 75
Reputation: 862431
If working with scalars use and
instead &
with remove last level of MultiIndex
by MultiIndex.droplevel
.
So use:
def money(a):
if a < 1000:
return 0
if (1000 <= a) and (a < 2000):
return 1
if (2000 <= a) and (a < 3000):
return 2
if (3000 <= a) and (a < 4000):
return 3
if (4000 <= a) and (a < 5000):
return 4
if a >= 5000:
return 5
rfm.columns = rfm.columns.droplevel(-1)
rfm['money rank'] = rfm['money'].apply(money)
Another solution here is use cut
:
rfm.columns = rfm.columns.droplevel(-1)
rfm['money rank'] = pd.cut(rfm['money'],
bins=[-np.inf, 1000,2000,3000,4000,5000,np.inf],
labels=[0,1,2,3,4,5],
right=False)
Upvotes: 1
Reputation: 2086
another solution and fast one is to make use of numpy searchsorted
import numpy as np
bins = np.array([1000 , 2000 , 3000 , 4000 , 5000])
rfm['money rank'] = bins.searchsorted(rfm['money'])
Upvotes: 0
Reputation: 416
You can write it as:
def money(a):
if a < 1000:
return 0
if 1000 <= a < 2000:
return 1
if 2000 <= a < 3000:
return 2
if 3000 <= a < 4000:
return 3
if 4000 <= a < 5000:
return 4
if a >= 5000:
return 5
In fact, a more short logic would be:
def money(a):
return min(5, a//1000)
PS: Assuming money is NOT negative, above solution will works. However, in case if you're willing to pass a negative value, you can also write it as:
def money(a):
return max(0, min(5, a//1000))
Also, since you are just passing money to .apply
, you can use lambda function as:
rfm['money rank'] = rfm['money'].apply(lambda a: max(0, min(5, a//1000)))
Hope that helps!
Upvotes: 0