Mike
Mike

Reputation: 2097

Incorporating Dplyr Join and Set Operations into a Custom Function

I have the two simple dataframes below. I would like to use dplyr and the tidyverse to find the categories in "Task2" of the second dataframe (Df2) that are not in "Task" of the first dataframe (Df). I would like to use dplyr's "setdiff" function for this. Also, I would like to keep the corresponding time from the "Time" column of the second dataframe (Df2).

Therefore, the end product should include two rows, one for "Iron shirt" for client "Chris", with a total time of 30, and one row for client "Eric", with "Buy groceries", and the corresponding time of 8.

I would also like to drop the date column.

I'm thinking one way to do this would be to use dplyr's "setdiff" function (I realize the Task and Task2 column names will have to be changed so they match) to separate out the two rows, then rejoin the total time with a join function.

Finally, I would like this to be a custom function since I will have to do this task repeatedly. I would like a function like "Differences(Df1,Df2)"...so I can enter in the two dataframes, and get the result.

I hope this isn't asking too much! I'm new to custom functions, especially functions that incorporate dplyr and piping.

Hope someone can help me out!

CaseWorker<-c("John","John","Kim")

Client<-c("Chris","Chris","Eric")

Task<-c("Feed cat","Make dinner","Do homework")

Date<-c("10/27/2016","09/22/2016","10/11/2016")

Df<-data.frame(CaseWorker,Client,Date,Task)

Second dataframe...

CaseWorker<-c("John","John","John","John","John","John","John","John","John",
          "John","Kim","Kim","Kim")

Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric","Eric")

Date<-c("11/10/2016","10/10/2016","11/13/2016","09/18/2016","11/11/2016","09/19/2016","08/08/2016","10/10/2016","08/05/2016","11/12/2016","09/09/2016","11/11/2016","09/10/2016")

Task2<-c("Feed cat","Feed cat","Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Iron shirt","Iron shirt","Do homework",
"Do homework","Buy groceries")

Time<-c(20,34,11,10,5,6,55,30,20,10,12,10,8)

Df2<-data.frame(CaseWorker,Client,Date,Task2,Time)

Upvotes: 1

Views: 237

Answers (1)

akrun
akrun

Reputation: 887028

We can use anti_join

library(dplyr)
anti_join(Df2, Df, by = c("Task2"="Task")) %>%
         group_by(CaseWorker,Client, Task2) %>% 
         summarise(Time = sum(Time))
#    CaseWorker Client         Task2  Time
#        <fctr> <fctr>        <fctr> <dbl>
#1       John  Chris    Iron shirt    30
#2        Kim   Eric Buy groceries     8

If we need to convert to a function

DiffGoals <- function(dat1, dat2) {
            anti_join(dat1, dat2, by = c("Task2" = "Task")) %>%
                   group_by(CaseWorker, Client, Task2) %>%
                   summarise(Time = sum(Time))
 }

DiffGoals(Df2, Df)

Upvotes: 1

Related Questions