Seeing if one value exists in another column by ID

Question

Dataframe is as follows:

Target  Source      Source_Match
A       source1     source2
A       source2     source4
A       source3     source1
B       source1     source2
B       source3     source4
B       source4     source5
C       source5     source2
C       source6     source3
C       source7     source4

I want to see if the values in "Source_Match" exist in each "Target's" "Source" list.

Final result should look like this:

Target  Source       Source_Match   Found In Target?
A       source1      source2        Yes
A       source2      source4        No
A       source3      source1        Yes
B       source1      source2        No
B       source3      source4        Yes
B       source4      source5        No
C       source5      source2        No
C       source6      source3        No
C       source7      source4        No

Any help would be great, thanks!

Frank · Accepted Answer

The dplyrish way is:

library(dplyr)
DF %>% group_by(Target) %>% mutate(found = Source_Match %in% Source)

The analogous data.table code is

library(data.table)
setDT(DF)
DF[, found := Source_Match %in% Source, by=Target]

If the "source" columns are of character type, %chin% can be used in place of %in%. It is a faster version specialized to this case available in the data.table package. (Thanks, @akrun.)

And another idea, from @eddi's comment:

a faster? alternative:

DF[, found := 'No'][DF, on = .(Target, Source_Match = Source), found := 'Yes']

Seeing if one value exists in another column by ID

Answers (2)

Related Questions