Reputation: 75
I have two data frames: A
with 189 rows and 79 columns, and B
with 354 rows and 2 columns. Some of their row names match approximately like below:
A:
Sample value1 value2 value 3
10003 a b d
10003_Qi1 a a c
10003_Qi2 b a c
10017 b g c
10018 b f s
10025_Qi o w c
10040_Qi1 x y o
10040_ArT1 e g g
10125 p g m
10140_Ar1 w n c
10225 z c p
B:
Sample first
10003 4
10004 8
10018 45
10025 85
10032 7
10040 54
10140 2
10132 8
10200 65
10324 9
10400 32
I want to merge the two data frames based on an inexact match of their row names, which would result in this:
Sample value1 value2 value 3 first
10003 a b d 4
10018 b f s 45
10025_Qi o w c 85
10040_Qi1 x y o 54
10140_Ar1 w n c 2
The rule is:
If they match exactly, otherwise
If they match on the first five numbers, with
a. _Qi1
and _Qi2
in A
, the sample in B
gets the sample value in A
with _Qi1
b. _Qi1
and _ArT1
in A
, the sample in B
gets the sample value in A
with _Qi1
and merge
Upvotes: 1
Views: 265
Reputation: 736
Here my cumbersome solution:
A<-data.frame(matrix(c(10003,"10003_q1","10007_q1",10008,1,2,3,2,4,3,1,2),4,3))
colnames(A)<-c("sample","value1","value2")
# sample value1 value2
# 1 10003 1 4
# 2 10003_q1 2 3
# 3 10007_q1 3 1
# 4 10008 2 2
B<-data.frame(matrix(c(10003,10004,10007,10009,4,8,45,85),4,2))
colnames(B)<-c("sample","first")
# sample first
# 1 10003 4
# 2 10004 8
# 3 10007 45
# 4 10009 85
# step 1: adapt both dataframes
A$first<-NA
A$sample2<-strtrim(A$sample,5)
B$sample<-as.factor(B$sample)
# step 2: work down table A merging values from table B
# note: this assumes that B$sample is unqiue
for(i in 1:NROW(A)){
ind<-A$sample2[i]==B$sample
if(sum(ind)!=0){ # makes sure a value was found
A[i,"first"]<-B$first[ind]
}
}
# step 3: remove any duplicates of A$sample2
# note: this assumes that the 5 digit number will always come before the number+extension
A<-A[!duplicated(A$sample2),]
# sample value1 value2 first sample2
# 1 10003 1 4 4 10003
# 3 10007_q1 3 1 45 10007
# 4 10008 2 2 NA 10008
Upvotes: 1