common entries in two columns in data table

Question

I have two data tables with following content

library('data.table')
df = data.table('rank' = c(1,2,3,4,5,6,7,8,9,10), 'h1' =c ('eye', 'arm', 'elbow', 'leg', 'nose', 'ear', 'nose' ,'hand' ,'hair', 'finger'), 'h2' = c( 'arm', 'fear', 'mouth', 'nose', 'back', 'bone' ,'hand' ,'hair', 'tail', 'nail'))

    rank     h1    h2
1:    1    eye   arm
2:    2    arm  fear
3:    3  elbow mouth
4:    4    leg  nose
5:    5     no  back
6:    6    ear  bone
7:    7   nose  hand
8:    8   hand  hair
9:    9   hair  tail
10:   10 finger  nail

df2 = data.table ('aa' = c('arm', 'leg', 'hair'), 'group' = c('up', 'down', 'up'))

   aa group
1:  arm    up
2:  leg  down
3:  hair   up

I need to find the commmon entries between two columns in df1. Thats easy to do and I got it. df2 shows the groups corressponding to the entries in df1. I need to find the common entries in df1 along as per the group which would be

arm, hair ( for up)

leg ( for down)

Expected output is

[false, true, false, false , false, false, false, false, true, false]

[false, false, false, true, false, false,false, false,false, false]

anotherfred · Accepted Answer

You do not say exactly how you would like the output, but assuming you want to, for all values of group, obtain a boolean vector of matching the aa column to h1, you can use something like (lapply used for clarity):

# join the 2 tables so we can access the 'group' column
df3 <- df2[df, on=c(aa="h1")]
# for all unique (and non NA) values of 'group', test if each row is of that group
lapply(df3[!is.na(group),unique(group)], function(x) df3[,!is.na(group) & group==x])

which returns

[[1]]
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE

[[2]]
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

Edit: To add the corresponding group names, use names like this:

results <- lapply(df3[!is.na(group),unique(group)], function(x) df3[,!is.na(group) & group==x])
# set the names of the list items to the group names
names(results) <- df3[!is.na(group),unique(group)]

I have also added comments to the code above.

common entries in two columns in data table

Answers (1)

Related Questions