How to write function in R that runs on conditional and includes other functions

Question

I'm trying to write a function in R that will do a few things all at once, and I think that function has to take two data frames to work.

In a previous question, I asked how to add rows from data frames to one another. I ended up using this code to do that, as provided in one of the answers:

MissingFromC1 <- anti_join(C2, C1, by = c("HW", "Var"))
MissingFromC1$Freq <- 0
All_c1 <- full_join(C1, MissingFr1, by = c("HW", "Var", "Freq"))

Where C1 and C2 are two data frames made up of three columns: HW, Var, and Freq. Each HW has several Var of various frequencies. They look like this:

            C1                                               C2    
Headword   Spelling   Freq                    Headword     Spelling   Freq
 Word1       Sp1a      x                        Word1         Sp1a      x
 Word1       Sp1b      x                        Word1         Sp1c      x
 Word1       Sp1d      x                        Word2         Sp2a      x
 Word2       Sp2a      x                        Word2         Sp2b      x     
 Word3       Sp1a      x

C1 and C2 aren't the same - each includes HW and Var that aren't in the other. I wanted to make sure the two were both the same length and so the code above adds missing rows from C2 to C1 (and then I ran it again but on the other data frame).

What I want to do now is turn this into a function. But with a change - I only want to join rows where the Var is missing from a HW. I don't want to add new HW to C1 or C2, just missing Var. In fact, if a HW is in C1 but not C2, for example, then I'd like it filtering out - i.e. in the example above, Word3 is in C1 but there are no Word3 Vars in C2 at all, so I'd like it filtering out completely. (I'm wanting to compare ratios of Var for each HW, but this won't work if I have any HW made up of Var that all have Freq = 0). I hope this makes sense!

I had a go at writing the code for it, just to try and show what I'm trying to do (I realise this code is very wrong! I just thought it might help).

add.missing.to.df1 <- function(df1, df2) {
if(is.element(df2$HW, df1$HW))) 
  missing.val <- anti_join(df2, df1, by = c("HW", "Var"))
  missing.val$Fr <- 0
  All_df2 <- full_join(df1, miss.val, by = c("HW", "Var", "Fr"))
  df2_fin <- filter(All_df2, if(!is.element(df2$HW, df1$HW)))
  }

So in the end, I want to have two data frames. Each one includes HW that has at least one Var in both data frames. If HW is in C1 but not C2 (or vice versa) then I want to filter it out.

Is it possible to do all this? And is it possible to tie it all up into a function? If so, how?

Thank you to anyone who can help!

Phil · Accepted Answer

As we discussed in the comments, it looks like an dplyr::inner_join() will do what you need. From the documentation:

inner_join return all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned.

So using your data you could try:

library("dplyr")
df <- inner_join(C1, C2, by = c("Headword", "Spelling"))
df
#   Headword Spelling Freq.x Freq.y
# 1    Word1     Sp1a      1      1
# 2    Word2     Sp2a      4      3

As for your original question about calling two data frames in a function, this is just done with:

my_function <- function(df1, df2, ...) {
  # do some stuff here
}

Then called with my_function(df1, df2).

How to write function in R that runs on conditional and includes other functions

Answers (1)

Related Questions