Find mean for sorted top n transactions

Question

Let my dataframe has 2 columns, customer id & transaction amount. Now for each unique customer id i would like to find the transaction amount(sorted in descending order) & then from the sorted column i will find the mean transaction amount for top three transactions of the sorted list.

Cust_id     trans_amount
12345          100      
12345          200      
12345          170      
12345          300      
12345          250
12456          140        
12456          240       
12456          160       
12456          100

The format i am looking for is,

Cust_id     trans_amount
12345          300               
12345          250      
12345          200      
12345          170      
12345          100
12456          240        
12456          160       
12456          140       
12456          100

and from there the mean for top 3, ie,

Cust_id    mean_for_top_3
12345         250
12456         180

for the intermediate part,I have tried,

ddply(cust_data,.(cust_id.),summarize,sorted_amount=sort(trans_amount,,decreasing=TRUE))

but not getting the result. Please advise how can i reach to my desired output.

pogibas · Accepted Answer

Solution using data.table:

library(data.table)
setDT(cust_data)
cust_data_sort <- cust_data[, .(trans_amount = sort(trans_amount, decreasing = TRUE)), Cust_id]
cust_data_sort[, .(mean_for_top_3 = mean(head(trans_amount, 3))), Cust_id]
   Cust_id mean_for_top_3
1:   12345            250
2:   12456            180

If you don't need sorted table cust_data_sort, then you can do get mean using this:

cust_data[, .(mean_for_top_3 = mean(head(sort(trans_amount, decreasing = TRUE), 3))), Cust_id]

Find mean for sorted top n transactions

Answers (2)

Related Questions