Reputation: 69
I have a table
Employee Details:
EmpID | WorkingPlaces | Salary
1001 | Bangalore | 5000
1001 | Chennai | 6000
1002 | Bombay | 1000
1002 | Chennai | 500
1003 | Pune | 2000
1003 | Mangalore | 1000
A same employee works for different places in a month. How to find the top 2 highly paid employees.
The result table should look like
EmpID | WorkingPlaces | Salary
1001 | Chennai | 6000
1001 | Bangalore | 5000
1003 | Pune | 2000
1003 | Mangalore | 1000
My code: in R language
knime.out <- aggregate(x= $"EmpID", by = list(Thema = $"WorkingPlaces", Project = $"Salary"), FUN = "length") [2]
But this doesnt give me the expected result. Kindly help me to correct the code.
Upvotes: 2
Views: 996
Reputation: 388982
A base R solution. Considering your dataframe as df
. We first aggregate
the data by EmpId
and calculate their sum
. Then we select the top 2 EmpID
's for which the salary is highest and find the subset of those ID's in the original dataframe using %in%
.
temp <- aggregate(Salary~EmpID, df, sum)
df[df$EmpID %in% temp$EmpID[tail(order(temp$Salary), 2)], ]
# EmpID WorkingPlaces Salary
#1 1001 Bangalore 5000
#2 1001 Chennai 6000
#5 1003 Pune 2000
#6 1003 Mangalore 1000
Upvotes: 2
Reputation: 887118
We can try with dplyr
library(dplyr)
df1 %>%
group_by(EmpID) %>%
mutate(SumSalary = sum(Salary)) %>%
arrange(-SumSalary, EmpID) %>%
head(4) %>%
select(-SumSalary)
Upvotes: 3