Mathumathi Anblagan
Mathumathi Anblagan

Reputation: 69

How to groupby column value using R programming

I have a table

Employee Details:

EmpID | WorkingPlaces | Salary
1001  | Bangalore     | 5000
1001  | Chennai       | 6000
1002  | Bombay        | 1000
1002  | Chennai       | 500
1003  | Pune          | 2000
1003  | Mangalore     | 1000

A same employee works for different places in a month. How to find the top 2 highly paid employees.

The result table should look like

EmpID | WorkingPlaces | Salary
1001  | Chennai       | 6000
1001  | Bangalore     | 5000
1003  | Pune          | 2000
1003  | Mangalore     | 1000

My code: in R language

knime.out <- aggregate(x= $"EmpID", by = list(Thema = $"WorkingPlaces", Project = $"Salary"), FUN = "length") [2]

But this doesnt give me the expected result. Kindly help me to correct the code.

Upvotes: 2

Views: 996

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

A base R solution. Considering your dataframe as df. We first aggregate the data by EmpId and calculate their sum. Then we select the top 2 EmpID's for which the salary is highest and find the subset of those ID's in the original dataframe using %in%.

temp <- aggregate(Salary~EmpID, df, sum)
df[df$EmpID %in% temp$EmpID[tail(order(temp$Salary), 2)], ]

#  EmpID WorkingPlaces Salary
#1  1001     Bangalore   5000
#2  1001       Chennai   6000
#5  1003          Pune   2000
#6  1003     Mangalore   1000

Upvotes: 2

akrun
akrun

Reputation: 887118

We can try with dplyr

library(dplyr)
df1 %>%
   group_by(EmpID) %>% 
   mutate(SumSalary = sum(Salary)) %>% 
   arrange(-SumSalary, EmpID) %>% 
   head(4) %>%
   select(-SumSalary)

Upvotes: 3

Related Questions