Reputation: 161
I have data in the below format. I've to find the sum of values of the top 25% in each group - grouped by ID.
If the number corresponding to the top 25% in a group is a fraction, give the data for the ceil(higher number) of that fraction
ID Value
3 10
3 83
1 13
2 62
2 26
3 63
3 37
1 17
3 22
3 41
3 45
2 65
3 63
1 45
2 71
3 61
2 97
2 89
3 72
3 16
1 22
While I was able to get the top n rows from using the below code, I've not figured a way to get the top n% in each group
p2<-group_by(data, ID) %>%
mutate(rank = rank(desc(Value))) %>%
arrange(rank) %>%
filter(rank <= 3)
The expected result looks something like this. There are 4 rows in the data with ID 1 - top 25% means 1 row.
There are 6 rows with ID 2 - ceil(0.25*6) is 2 - so top 2 rows. Similarly, for ID 3, there are 11 rows - ceil(0.25*11) is top 3 rows under ID 3.
Please let me know how to get this!
ID Value
1 45
2 97
2 89
3 83
3 72
3 65
Upvotes: 1
Views: 326
Reputation: 388807
We can arrange
the dataframe in descending order of Value
and then slice
the number of rows based on the ratio for each group.
library(dplyr)
df %>%
arrange(desc(Value)) %>%
group_by(ID) %>%
slice(1:ceiling(0.25 * n()))
# ID Value
# <int> <int>
#1 1 45
#2 2 97
#3 2 89
#4 3 83
#5 3 72
#6 3 63
We can also do this in base R. First order
the rows based on Value
, split
the rows based on ID
and then select top n
rows from each group.
new_df <- df[order(df$Value, decreasing = TRUE), ]
new_df[unlist(Map(head, split(1:nrow(new_df), new_df$ID),
ceiling(table(new_df$ID) * 0.25))), ]
# ID Value
#14 1 45
#17 2 97
#18 2 89
#2 3 83
#19 3 72
#6 3 63
Upvotes: 7