Reputation: 137
I'm trying to write a function in R that will do a few things all at once, and I think that function has to take two data frames to work.
In a previous question, I asked how to add rows from data frames to one another. I ended up using this code to do that, as provided in one of the answers:
MissingFromC1 <- anti_join(C2, C1, by = c("HW", "Var"))
MissingFromC1$Freq <- 0
All_c1 <- full_join(C1, MissingFr1, by = c("HW", "Var", "Freq"))
Where C1 and C2 are two data frames made up of three columns: HW, Var, and Freq. Each HW has several Var of various frequencies. They look like this:
C1 C2
Headword Spelling Freq Headword Spelling Freq
Word1 Sp1a x Word1 Sp1a x
Word1 Sp1b x Word1 Sp1c x
Word1 Sp1d x Word2 Sp2a x
Word2 Sp2a x Word2 Sp2b x
Word3 Sp1a x
C1 and C2 aren't the same - each includes HW and Var that aren't in the other. I wanted to make sure the two were both the same length and so the code above adds missing rows from C2 to C1 (and then I ran it again but on the other data frame).
What I want to do now is turn this into a function. But with a change - I only want to join rows where the Var is missing from a HW. I don't want to add new HW to C1 or C2, just missing Var. In fact, if a HW is in C1 but not C2, for example, then I'd like it filtering out - i.e. in the example above, Word3 is in C1 but there are no Word3 Vars in C2 at all, so I'd like it filtering out completely. (I'm wanting to compare ratios of Var for each HW, but this won't work if I have any HW made up of Var that all have Freq = 0). I hope this makes sense!
I had a go at writing the code for it, just to try and show what I'm trying to do (I realise this code is very wrong! I just thought it might help).
add.missing.to.df1 <- function(df1, df2) {
if(is.element(df2$HW, df1$HW)))
missing.val <- anti_join(df2, df1, by = c("HW", "Var"))
missing.val$Fr <- 0
All_df2 <- full_join(df1, miss.val, by = c("HW", "Var", "Fr"))
df2_fin <- filter(All_df2, if(!is.element(df2$HW, df1$HW)))
}
So in the end, I want to have two data frames. Each one includes HW that has at least one Var in both data frames. If HW is in C1 but not C2 (or vice versa) then I want to filter it out.
Is it possible to do all this? And is it possible to tie it all up into a function? If so, how?
Thank you to anyone who can help!
Upvotes: 3
Views: 419
Reputation: 4444
As we discussed in the comments, it looks like an dplyr::inner_join()
will do what you need. From the documentation:
inner_join
return all rows fromx
where there are matching values iny
, and all columns fromx
andy
. If there are multiple matches betweenx
andy
, all combination of the matches are returned.
So using your data you could try:
library("dplyr")
df <- inner_join(C1, C2, by = c("Headword", "Spelling"))
df
# Headword Spelling Freq.x Freq.y
# 1 Word1 Sp1a 1 1
# 2 Word2 Sp2a 4 3
As for your original question about calling two data frames in a function, this is just done with:
my_function <- function(df1, df2, ...) {
# do some stuff here
}
Then called with my_function(df1, df2)
.
Upvotes: 1