hash
hash

Reputation: 103

Matching IDs and generating new columns

I am trying to match sample_id1, sample_id2, value from File1.txt with sampleID and valueID from count.txt and multiply count with op1 and op2. I am not sure how I can do this in a for loop and define two new columns.

File1.txt
sample_id1   sample_id2    op1   op2  value
     IX12        IX78       2     3    15
     IX12        IX81       4     0    15   
     IX78        IX81       66    43   35
     IX12        IX78       23    10   35
     IX12        IX81       43    0    55
     IX12        IX78       43    0    55

File2.txt
sampleID  valueID  count    
   IX12      15   900
   IX12      35   2000
   IX12      55   3500
   IX78      15   865
   IX78      35   1500
   IX78      35   2400 
   IX81      15   234
   IX81      35   906
   IX81      55   2001


output.txt
sample_id1   sample_id2    op1   op2  value  new_col_op1  new_col_op2 
     IX12        IX78       2     3    15        1800       2595   
     IX12        IX81       4     0    15        3600         0
     IX78        IX81       66    43   35        99000      38958       
     ........

In output.txt I multiplied op1 and op2 from first row with 900 and 865 and generated new_col_op1 and new_col_op2

Thanks.

Upvotes: 0

Views: 61

Answers (1)

akrun
akrun

Reputation: 887891

I am not sure whether this is what you wanted: (df1 and df2 are the datasets)

df11 <- transform( merge(df1[,-2], df2, by.x=c("sample_id1", "value"), by.y=c("sampleID", "valueID")),  new_col_op1=op1*count)
df12 <- transform( merge(df1[,-1], df2, by.x=c("sample_id2", "value"), by.y=c("sampleID", "valueID")),  new_col_op2=op2*count)

 res <- merge(df11, df12, by=c("value", "op1", "op2"),sort=FALSE)
  head(res,4)
 #  value op1 op2 sample_id1 count.x new_col_op1 sample_id2 count.y new_col_op2
 #1    15   2   3       IX12     900        1800       IX78     865        2595
 #2    15   4   0       IX12     900        3600       IX81     234           0
 #3    35  23  10       IX12    2000       46000       IX78    1500       15000
 #4    35  23  10       IX12    2000       46000       IX78    2400       24000

Or using dplyr

 library(dplyr)
 df11 <-  inner_join(df1[,-2],setNames(df2, c("sample_id1", "value", "count"))) %>% mutate(new_col_op1=op1*count)
 df12 <-  inner_join(df1[,-1],setNames(df2, c("sample_id2", "value", "count"))) %>% mutate(new_col_op2=op2*count)
 res1 <- inner_join(df11, df12, by=c("op1", "op2", "value"))

  head(res1,4)
  #  sample_id1 op1 op2 value count.x new_col_op1 sample_id2 count.y new_col_op2
  #1       IX12   2   3    15     900        1800       IX78     865        2595
  #2       IX12   4   0    15     900        3600       IX81     234           0
  #3       IX12  23  10    35    2000       46000       IX78    1500       15000
  #4       IX12  23  10    35    2000       46000       IX78    2400       24000

Upvotes: 1

Related Questions