Reputation: 4243
Dataframe is as follows:
Target Source Source_Match
A source1 source2
A source2 source4
A source3 source1
B source1 source2
B source3 source4
B source4 source5
C source5 source2
C source6 source3
C source7 source4
I want to see if the values in "Source_Match" exist in each "Target's" "Source" list.
Final result should look like this:
Target Source Source_Match Found In Target?
A source1 source2 Yes
A source2 source4 No
A source3 source1 Yes
B source1 source2 No
B source3 source4 Yes
B source4 source5 No
C source5 source2 No
C source6 source3 No
C source7 source4 No
Any help would be great, thanks!
Upvotes: 2
Views: 675
Reputation: 13570
Using the base package. I'm sure there are much more efficient ways to do it using the base package.
df1 <- df[, c(1,2)]
df2 <- df[, c(1,3)]
colnames(df2) <- colnames(df1)
df$found <- duplicated(rbind(df1,df2))[(nrow(df)+1):(nrow(df)*2)]
Output:
Target Source Source_Match found
1 A source1 source2 TRUE
2 A source2 source4 FALSE
3 A source3 source1 TRUE
4 B source1 source2 FALSE
5 B source3 source4 TRUE
6 B source4 source5 FALSE
7 C source5 source2 FALSE
8 C source6 source3 FALSE
9 C source7 source4 FALSE
Upvotes: 0
Reputation: 66819
The dplyrish way is:
library(dplyr)
DF %>% group_by(Target) %>% mutate(found = Source_Match %in% Source)
The analogous data.table code is
library(data.table)
setDT(DF)
DF[, found := Source_Match %in% Source, by=Target]
If the "source" columns are of character type, %chin%
can be used in place of %in%
. It is a faster version specialized to this case available in the data.table package. (Thanks, @akrun.)
And another idea, from @eddi's comment:
a faster? alternative:
DF[, found := 'No'][DF, on = .(Target, Source_Match = Source), found := 'Yes']
Upvotes: 6