Reputation: 363
Let my dataframe has 2 columns, customer id & transaction amount. Now for each unique customer id i would like to find the transaction amount(sorted in descending order) & then from the sorted column i will find the mean transaction amount for top three transactions of the sorted list.
Cust_id trans_amount
12345 100
12345 200
12345 170
12345 300
12345 250
12456 140
12456 240
12456 160
12456 100
The format i am looking for is,
Cust_id trans_amount
12345 300
12345 250
12345 200
12345 170
12345 100
12456 240
12456 160
12456 140
12456 100
and from there the mean for top 3, ie,
Cust_id mean_for_top_3
12345 250
12456 180
for the intermediate part,I have tried,
ddply(cust_data,.(cust_id.),summarize,sorted_amount=sort(trans_amount,,decreasing=TRUE))
but not getting the result. Please advise how can i reach to my desired output.
Upvotes: 0
Views: 59
Reputation: 6325
An idiomatic solution using dplyr
df <- read.table(text = "Cust_id trans_amount
12345 100
12345 200
12345 170
12345 300
12345 250
12456 140
12456 240
12456 160
12456 100 ", header = T)
library(dplyr)
df %>% group_by(Cust_id) %>%
arrange(desc(trans_amount), .by_group = T) %>%
top_n(n = 3) %>%
summarize(mean = mean(trans_amount))
# A tibble: 2 x 2
Cust_id mean
<int> <dbl>
1 12345 250
2 12456 180
Alternative with count:
> df %>% group_by(Cust_id) %>%
+ #arrange(desc(trans_amount), .by_group = T) %>%
+ mutate(count = n()) %>%
+ top_n(n = 3, wt = trans_amount) %>%
+ mutate(mean = mean(trans_amount)) %>%
+ select(Cust_id,mean,count) %>% distinct()
# A tibble: 2 x 3
# Groups: Cust_id [2]
Cust_id mean count
<int> <dbl> <int>
1 12345 250 5
2 12456 180 4
>
Upvotes: 1
Reputation: 28339
Solution using data.table
:
library(data.table)
setDT(cust_data)
cust_data_sort <- cust_data[, .(trans_amount = sort(trans_amount, decreasing = TRUE)), Cust_id]
cust_data_sort[, .(mean_for_top_3 = mean(head(trans_amount, 3))), Cust_id]
Cust_id mean_for_top_3
1: 12345 250
2: 12456 180
If you don't need sorted table cust_data_sort
, then you can do get mean using this:
cust_data[, .(mean_for_top_3 = mean(head(sort(trans_amount, decreasing = TRUE), 3))), Cust_id]
Upvotes: 3