Reputation: 12034
I have a dataset containing information in following format.
Customerid ,customerinvoicedate, customerduedate, customrpaiddate, amount, cleared ?
1, 03-mar-2017, 02-june-2017, 03-april-2017, $200 , yes
2, 01-jan-2017, 11-dec-2017, 15-dec-2017, $25000, yes
1, 01-aug-2017, 05-oct-2017, 04-sep-2017, $100, yes
I want to group customers and in each group for each invoice , I want to count number of invoices which was cleared before customerinvoicedate. Now for customer 1 while recording invoice dated on 01-aug-2017 it should check previous invoices which are cleared and count them. (here count is 1)
I tried this but doesn't seem to be working.
data.groupby(['Customerid']).apply(lambda x:np.count(data['customrpaiddate']<=x['customerinvoicedate']))
it gives an error :- ValueError: Can only compare identically-labeled Series objects
Upvotes: 0
Views: 185
Reputation: 7466
After you compare "paid date" and "due date", you can sum the boolean result (as True = 1 and False = 0) to count how many occurrences are True
.
Since you want "the number of invoices cleared so far", you need to use cumulative sum:
cleared_invoices = df.groupby('Customerid').apply(lambda x: (x['customrpaiddate'] <= x['customerduedate']).cumsum())\
.reset_index(level = 0, drop = True)\
.rename('Cleared Invoices')
out = pd.concat([df, cleared_invoices], axis = 1)
which gives:
out
Out[137]:
Customerid customerinvoicedate customerduedate customrpaiddate \
0 1 03-03-2017 2017-02-06 2017-03-04
1 2 01-01-2017 2017-11-12 2017-12-15
2 1 01-8-2017 2017-05-10 2017-04-09
amount cleared Cleared Invoices
0 $200 yes 0
1 $25000 yes 0
2 $100 yes 1
Upvotes: 1