Reputation: 55
I have two data frames:
c1 <- c("chr1:981994","chr1:1025751","chr2:6614300", "chr2:6784300")
c2 <- c("G/A","C/T","A/T", "T/G")
df1 <- data.frame(c1,c2)
a <- c("chr1:981994","chr1:1000000","chr2:6614300", "chr2:6784300")
b <- c("G/G","C/C","A/A", "T/T")
c <- c("G/G","C/C","A/T", "T/T")
d <- c("G/A","C/T","A/T", "T/G")
df2 <- data.frame(a, b, c, d)
I would like to compare the two data frames and if data in column 1 matches, then compare columns b, c and d against column c2 and if at least TWO columns or more from dataframe 2 (a, b, c and/or d) are different to c2 in dataframe 1 then output this row into a new data frame. So for this example the first and last row of data frame 2 would be outputted as both column 1 entries match and columns b and c are different to c2.
I am very new to R, I tried to look at the compare function but got a little overwhelmed. Would very much appreciate any help.
Upvotes: 3
Views: 10704
Reputation: 56935
Thanks for the reproducible example.
First, you can merge with merge
. Have a look at ?merge
for other configuration options - you can specify the column to merge on using by.x
and by.y
df3 = merge(df1, df2, by.x='c1', by.y='a')
# c1 c2 b c d
# 1 chr1:981994 G/A G/G G/G G/A
# 2 chr2:6614300 A/T A/A A/T A/T
# 3 chr2:6784300 T/G T/T T/T T/G
Note that the non-matching rows in df1
and df2
are left out here.
Then you can just filter out the rows where c2
matches exactly one of columns b,c, d (then by definition it will not match the other two).
There are lots of ways to do this, e.g.
as.character(df3$c2) == df3[, c('b', 'c', 'd')]
# b c d
# [1,] FALSE FALSE TRUE
# [2,] FALSE TRUE TRUE
# [3,] FALSE FALSE TRUE
rows with only one TRUE are the ones you want.
df3[rowSums(as.character(df3$c2) == df3[, c('b', 'c', 'd')]) == 1, ]
Or you could simply loop over all the rows, or use something like apply
:
apply(df3, 1, function (row) {
sum(row['c2'] == row[c('b', 'c', 'd')]) == 1
})
# [1] TRUE FALSE TRUE
df3[.Last.value, ]
Upvotes: 3