user3745763
user3745763

Reputation: 9

comparing two files and outputting common elements

I have 2 files of 3 columns and hundreds of rows. I want to compare and list the common elements of first two columns of the two files. Then the list which i will get after comparing i have to add the third column of second file to that list. Third column will contain the values which were in the second file corresponding to numbers of remaining two columns which i have got as common to both the files. For example, consider two files of 6 rows and 3 columns First file -

1 2   3
2 3   4
4 6   7
3 8   9
11 10 5
19 6  14

second file -

1 4   1
2 1   4
4 6   10
3 7   2
11 10 3
19 6  5

As i said i have to compare the first two columns and then add the third column of second file to that list. Therefore, output must be:

  4  6  10
  11 10 3
  19  6 5

I have the following code, however its showing an error object not found also i am not able to add the third column. Please help :) df2 = reading first file, df3 = reading second file. Code is in R language.

s1 = 1
for(i in 1:nrow(df2)){
 for(j in 1:nrow(df3)){
     if(df2[i,1] == df3[j,1]){
        if(df2[i,2] == df3[j,2]){
             common.rows1[s1,1] <- df2[i,1]
             common.rows1[s1,2] <- df2[i,2]
             s1 = s1 + 1
         }
     }
 }

Upvotes: 0

Views: 479

Answers (4)

David Arenburg
David Arenburg

Reputation: 92282

data.table proposal

library(data.table)
setDT(df1)
setDT(df2)
setkey(df1, V1, V2)
setkey(df2, V1, V2)

df2[df1[, -3, with = F], nomatch = 0]

##    V1 V2 V3
## 1:  4  6 10
## 2: 11 10  3
## 3: 19  6  5

Upvotes: 1

ebo
ebo

Reputation: 2747

It seems that this is the perfect use-case for merge, e.g.

merge(d1[c('V1','V2')],d2)

results in:

  V1 V2 V3
1 11 10  3
2 19  6  5
3  4  6 10

In which 'V1' and 'V2' are the column names of interest.

Upvotes: 1

talat
talat

Reputation: 70266

You can use the %in% operator twice to subset your second data.frame (I call it df2):

df2[df2$V1 %in% df1$V1 & df2$V2 %in% df1$V2,]

#  V1 V2 V3
#3  4  6 10
#5 11 10  3
#6 19  6  5

V1 and V2 in my example are the column names of df1 and df2.

Upvotes: 1

MrFlick
MrFlick

Reputation: 206197

If your two tables are d1 and d2,

d1<-data.frame(
    V1 = c(1, 2, 4, 3, 11, 19),
    V2 = c(2, 3, 6, 8, 10, 6), 
    V3 = c(3, 4, 7, 9, 5, 14)
)

d2<-data.frame(
    V1 = c(1, 2, 4, 3, 11, 19), 
    V2 = c(4, 1, 6, 7, 10, 6),
    V3 = c(1, 4, 10, 2, 3, 5)
)

then you can subset d2 (in order to keep the third column) with

d2[interaction(d2$V1, d2$V2) %in% interaction(d1$V1, d1$V2),]

The interaction() treats the first two columns as a combined key.

Upvotes: 0

Related Questions