Reputation: 768
I have a dataframe called cust
> head(cust,15)
Txn_date Cust_no Credit
1 2013-12-02 12345000 400.00
2 2013-12-02 12345000 300.00
3 2013-12-02 12345000 304.71
4 2013-12-02 12345000 475.00
5 2013-12-02 12345000 325.00
6 2013-12-02 34567890 1390.00
7 2013-12-02 34567890 100.00
8 2013-12-02 34567890 500.00
9 2013-12-02 23232323 5.00
10 2013-12-02 23232323 130.00
11 2013-12-02 23232323 5975.00
12 2013-12-02 23232323 3711.00
13 2013-12-02 14345422 12530.50
14 2013-12-02 14345422 3312.00
15 2013-12-02 98765432 370.00
To calculate total amount, based on the Cust_no, I have used the within
and then applied cumsum
to it, like below
newCust <-within(cust, {
RunningTotal <- ave(cust$Credit, cust$Cust_no, FUN = cumsum)})
and hence my result is this, as you can see the Credit is getting incremented based on the cust number
>head(newCust,15)
Txn_date Cust_no Credit RunningTotal
1 2013-12-02 12345000 400.00 400.00
2 2013-12-02 12345000 300.00 700.00
3 2013-12-02 12345000 304.71 1004.71
4 2013-12-02 12345000 475.00 1479.71
5 2013-12-02 12345000 325.00 1804.71
6 2013-12-02 34567890 1390.00 1390.00
7 2013-12-02 34567890 100.00 1490.00
8 2013-12-02 34567890 500.00 1990.00
9 2013-12-02 23232323 5.00 5.00
10 2013-12-02 23232323 130.00 135.00
11 2013-12-02 23232323 5975.00 6110.00
12 2013-12-02 23232323 3711.00 9821.00
13 2013-12-02 14345422 12530.50 12530.50
14 2013-12-02 14345422 3312.00 15842.5
15 2013-12-02 98765432 370.00 370
Now my question is, how do I find out the number of transactions for every Cust_no, using the above logic of within
and length
?? or maybe, some other logic.
I am not getting the desired output as shown below even I tried using aggregate()
and apply
..
Txn_date Cust_no Credit Frequency
1 2013-12-02 12345000 400.00 1
2 2013-12-02 12345000 300.00 2
3 2013-12-02 12345000 304.71 3
4 2013-12-02 12345000 475.00 4
5 2013-12-02 12345000 325.00 5
6 2013-12-02 34567890 1390.00 1
7 2013-12-02 34567890 100.00 2
8 2013-12-02 34567890 500.00 3
9 2013-12-02 23232323 5.00 1
10 2013-12-02 23232323 130.00 2
11 2013-12-02 23232323 5975.00 3
12 2013-12-02 23232323 3711.00 4
13 2013-12-02 14345422 12530.50 1
14 2013-12-02 14345422 3312.00 2
15 2013-12-02 98765432 370.00 1
Upvotes: 0
Views: 578
Reputation: 887711
You can try
within(cust, Frequency <- ave(seq_along(Cust_no), Cust_no, FUN=seq_along))
# Txn_date Cust_no Credit Frequency
#1 2013-12-02 12345000 400.00 1
#2 2013-12-02 12345000 300.00 2
#3 2013-12-02 12345000 304.71 3
#4 2013-12-02 12345000 475.00 4
#5 2013-12-02 12345000 325.00 5
#6 2013-12-02 34567890 1390.00 1
#7 2013-12-02 34567890 100.00 2
#8 2013-12-02 34567890 500.00 3
#9 2013-12-02 23232323 5.00 1
#10 2013-12-02 23232323 130.00 2
#11 2013-12-02 23232323 5975.00 3
#12 2013-12-02 23232323 3711.00 4
#13 2013-12-02 14345422 12530.50 1
#14 2013-12-02 14345422 3312.00 2
#15 2013-12-02 98765432 370.00 1
Or using splitstackshape
library(splitstackshape)
getanID(cust, 'Cust_no')
Or using data.table
library(data.table)
setDT(cust)[, Frequency:=1:.N, by=Cust_no]
Upvotes: 2