Arpp
Arpp

Reputation: 107

Calculate previous occurence

df  month order  customer  
0   Jan    yes       020    
1   Feb    yes       041   
2   April  no        020  
3   May    no        020     

Is there a way to calculate the last month a customer ordered if order = no? Expected Output

df  month order   customer  last_order
0   Jan    yes       020    
1   Feb    yes       041   
2   April  no        020     Jan
3   May    no        020     Jan

Upvotes: 4

Views: 90

Answers (2)

Ch3steR
Ch3steR

Reputation: 20669

You can df.groupby, and pd.Series.eq to check if value is yes, then use pd.Series.where and use pd.Series.ffill, then mask using pd.Series.mask

def func(s):
    m = s['order'].eq('yes')
    f = s['month'].where(m).ffill()
    return f.mask(m)

df['last_order'] = df.groupby('customer', group_keys=False).apply(func)

   month order customer last_order
0    Jan   yes      020        NaN
1    Feb   yes      041        NaN
2  March    no      020        Jan

Explanation

What happens in each of the group after groupby is the below, for example consider group where customer is 020

  month order
0   jan   yes
1   apr    no
2   may    no
3   jun   yes
4   jul    no

m = df['order'].eq('yes') # True where `order` is 'yes'
f = df['month'].where(m)#.ffill()
f
0    jan # ---> \
1    NaN         \ #`jan` and `jun` are visible as 
2    NaN         / # they were the months with `order` 'yes'
3    jun # ---> /
4    NaN
Name: month, dtype: object
# If you chain the above with with `ffill` it would fill the NaN values.

f = df['month'].where(m).ffill()
f
0    jan
1    jan # filled with valid above value i.e Jan
2    jan # filled with valid above value i.e Jan
3    jun
4    jun # filled with valid above value i.e Jun
Name: month, dtype: object

f.mask(m) # works opposite of `pd.Series.where`

0    NaN # --->\
1    jan        \ # Marked values `NaN` where order was `yes`.
2    jan        /
3    NaN # --->/
4    jun
Name: month, dtype: object

Upvotes: 6

Sebastien D
Sebastien D

Reputation: 4482

You might do it with df.iterrows:

df = pd.DataFrame([{'month': 'Jan',  'order': 'yes',  'customer': '020',  'month_2': 1,  'last_order': None},
 {'month': 'Feb',  'order': 'yes',  'customer': '041',  'month_2': 2,  'last_order': None},
 {'month': 'April',  'order': 'no',  'customer': '020',  'month_2': 4,  'last_order': 'Jan'},
 {'month': 'May',  'order': 'no',  'customer': '020',  'month_2': 5,  'last_order': 'Jan'}])

#Lets convert months to numeric value
dict_months = dict(Jan=1, Feb=2, March=3, April = 4,May=5, June = 6,Jul = 7, Aug = 8, Sep = 9, Oct = 10, Nov =11, Dec = 12)
df['month_2'] = df.month.map(dict_months)

#Insert a blank column for last_order
df['last_order'] = None

#Let's iter throught rows
for idx, row in df.iterrows():
    if row['order'] == "yes": continue
    #For each row, grab the customer and the current month and searchfor orders in previous months
    df_temp = df[(df.customer == row['customer']) & (df.month_2 < row['month_2'] )& (df.order == "yes")]
    #If any result found, let pick the last know order and update accordingly the DataFrame
    if df_temp.shape[0]>0: df.loc[[idx],'last_order'] = df_temp['month'].iloc[-1]
#remove unecessary column
del df['month_2']

Output

| month   | order   |   customer | last_order   |
|:--------|:--------|-----------:|:-------------|
| Jan     | yes     |        020 |              |
| Feb     | yes     |        041 |              |
| April   | no      |        020 | Jan          |
| May     | no      |        020 | Jan          |

Upvotes: 1

Related Questions