Reputation: 300
In the dataset below, I want to identify Top 3
time-consuming projects
library(dplyr)
TransID <-c(1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1014,1018,1022,1023,1024)
EmpID<-c('M001','M001','M001','M001','B005','B005','B005','B005','X101','X101','X101','Z101','K501','K501','K501','K501')
ProjectID <- c(200,200,200,200,500,500,500,500,950,950,950,950,1050,1050,1050,1050)
Site<-c('X','X','X','Y','Y','Y','Z','Z','Z','G','G','G','G','K','K','K')
Region <-c('NE','NW','SE','SW','MW','NW','SW','NE','NC','MW','NE','SE','SW','NC','SW','SE')
hour_difference<-c(1.45,2.14,2.53,3.69,1.73,2.47,3.63,1.59,0.75,1.18,2.78,9.55,1.85,2.39,5.52,0.23)
df = data.frame(TransID,EmpID,ProjectID,Site,Region,hour_difference)
df
Simply,
ProjectID
, I want to sum the hour_difference
and sort
in descending orderMy attempt:
df %>%
group_by(ProjectID,hour_difference) %>%
summarize(sum().sort_values())
Desired output:
for example, ProjectID = 950
will have a sum of 14.26
Upvotes: 0
Views: 59
Reputation: 15143
I'm confused about descending order of ProjectID
or sum of hour_difference
but you may try
sum(hour_difference)
df %>%
group_by(ProjectID) %>%
summarise(res = sum(hour_difference)) %>%
arrange(desc(res))
ProjectID res
<dbl> <dbl>
1 950 14.3
2 1050 9.99
3 200 9.81
4 500 9.42
ProjectID
df %>%
group_by(ProjectID) %>%
summarise(res = sum(hour_difference)) %>%
arrange(desc(ProjectID))
ProjectID res
<dbl> <dbl>
1 1050 9.99
2 950 14.3
3 500 9.42
4 200 9.81
Upvotes: 1