Nick Knauer
Nick Knauer

Reputation: 4243

Seeing if one value exists in another column by ID

Dataframe is as follows:

Target  Source      Source_Match
A       source1     source2
A       source2     source4
A       source3     source1
B       source1     source2
B       source3     source4
B       source4     source5
C       source5     source2
C       source6     source3
C       source7     source4

I want to see if the values in "Source_Match" exist in each "Target's" "Source" list.

Final result should look like this:

Target  Source       Source_Match   Found In Target?
A       source1      source2        Yes
A       source2      source4        No
A       source3      source1        Yes
B       source1      source2        No
B       source3      source4        Yes
B       source4      source5        No
C       source5      source2        No
C       source6      source3        No
C       source7      source4        No

Any help would be great, thanks!

Upvotes: 2

Views: 675

Answers (2)

mpalanco
mpalanco

Reputation: 13570

Using the base package. I'm sure there are much more efficient ways to do it using the base package.

df1 <- df[, c(1,2)]
df2 <- df[, c(1,3)]
colnames(df2) <- colnames(df1)
df$found <- duplicated(rbind(df1,df2))[(nrow(df)+1):(nrow(df)*2)]

Output:

 Target  Source Source_Match found
1      A source1      source2  TRUE
2      A source2      source4 FALSE
3      A source3      source1  TRUE
4      B source1      source2 FALSE
5      B source3      source4  TRUE
6      B source4      source5 FALSE
7      C source5      source2 FALSE
8      C source6      source3 FALSE
9      C source7      source4 FALSE

Upvotes: 0

Frank
Frank

Reputation: 66819

The dplyrish way is:

library(dplyr)
DF %>% group_by(Target) %>% mutate(found = Source_Match %in% Source)

The analogous data.table code is

library(data.table)
setDT(DF)
DF[, found := Source_Match %in% Source, by=Target]

If the "source" columns are of character type, %chin% can be used in place of %in%. It is a faster version specialized to this case available in the data.table package. (Thanks, @akrun.)

And another idea, from @eddi's comment:

a faster? alternative:

DF[, found := 'No'][DF, on = .(Target, Source_Match = Source), found := 'Yes']

Upvotes: 6

Related Questions