user2300940
user2300940

Reputation: 2385

Subtract paired column

I have a data frame with 192 columns. I want to make a 96 column data frame by subtracting matching column based in pairing. The pairing info is found in the data frame Pairing in the column Match . The column Pos matches as substring the column names of my data frame that I want to subtract.

How do I use the pairing info in Pairing to identify which column to subtract.

  > Pairing
Match              Pos
Control_70         001_A01
Control_56         001_A02
   Case_70         001_A03
   Case_56         001_A04
Control_21         001_A05
   Case_21         001_A06


> head(matures.cpm.spike.batch[,1:6])
              001_A01_S1 001_A02_S2 001_A03_S3 001_A04_S4 001_A05_S5 001_A06_S6
hsa-let-7a-5p  16.566813  11.415796  12.400252  22.701457   8.864882  20.442599
hsa-let-7b-5p  15.574190  11.107133  12.196465  17.954547   8.527478  25.788286
hsa-let-7c-5p   5.976763   4.372978   5.984685   9.821348   6.341252   7.480211
hsa-let-7d-3p  16.508818  10.697730  11.001534  18.375286   7.583910  24.974774
hsa-let-7d-5p  13.273824   5.134547   9.456675  11.567230   7.096485  13.294108
hsa-let-7f-5p  13.900711   9.804384  11.481614  20.002110   7.878241  17.295909

Upvotes: 0

Views: 106

Answers (2)

Iaroslav Domin
Iaroslav Domin

Reputation: 2718

Just in case, a different approach:

We are going to need a modified matching data frame, with separate columns for cases and controls:

library(tidyr)
library(reshape2)

P <- Pairing %>% 
    separate(Match, into = c("cc", "ind"), sep = "_") %>% 
    dcast(ind ~ cc, value.var = "Pos")

P:

  ind    Case Control
1  21 001_A06 001_A05
2  56 001_A04 001_A02
3  70 001_A03 001_A01

We also want colnames in matures.cpm.spike.batch to match names in P:

df <- matures.cpm.spike.batch
colnames(df) <- gsub("^X|_S.*", "", colnames(df))

colnames(df):

[1] "001_A01" "001_A02" "001_A03" "001_A04" "001_A05" "001_A06"

Now we can finish it simply as following:

case <- df[, P$Case]
control <- df[, P$Control]
res <- case - control

res:

                001_A06   001_A04   001_A03
hsa-let-7a-5p 11.577717 11.285661 -4.166561
hsa-let-7b-5p 17.260808  6.847414 -3.377725
hsa-let-7c-5p  1.138959  5.448370  0.007922
hsa-let-7d-3p 17.390864  7.677556 -5.507284
hsa-let-7d-5p  6.197623  6.432683 -3.817149
hsa-let-7f-5p  9.417668 10.197726 -2.419097

Upvotes: 0

G5W
G5W

Reputation: 37641

I am assuming that for every Case there is exactly one Control and vice versa. It seemed easiest to transform your Pairing data.frame to align the Case and the Control. Once that is done, you can just build the data frame you want.

## First, recreate your data
Pairing = read.table(text="Match              Pos
Control_70         001_A01
Control_56         001_A02
   Case_70         001_A03
   Case_56         001_A04
Control_21         001_A05
   Case_21         001_A06",
header=TRUE)

matures.cpm.spike.batch = read.table(text=" 001_A01_S1 001_A02_S2 001_A03_S3 001_A04_S4 001_A05_S5 001_A06_S6
hsa-let-7a-5p  16.566813  11.415796  12.400252  22.701457   8.864882  20.442599
hsa-let-7b-5p  15.574190  11.107133  12.196465  17.954547   8.527478  25.788286
hsa-let-7c-5p   5.976763   4.372978   5.984685   9.821348   6.341252   7.480211
hsa-let-7d-3p  16.508818  10.697730  11.001534  18.375286   7.583910  24.974774
hsa-let-7d-5p  13.273824   5.134547   9.456675  11.567230   7.096485  13.294108
hsa-let-7f-5p  13.900711   9.804384  11.481614  20.002110   7.878241  17.295909",
header=TRUE)

## Build Matches to replace your Pairing
Control = Pairing[grep("Control", Pairing$Match),]
Control = Control[order(Control$Match),]
Case = Pairing[grep("Case", Pairing$Match),]
Case = Case[order(Case$Match),]
Matches = cbind(Control, Case)

# Uses Matches to build desired data.frame
Diffs = data.frame(matures.cpm.spike.batch[, Matches[1,4]] - 
        matures.cpm.spike.batch[, Matches[1,2]])
colnames(Diffs)[1] = sub("Control", "Diff", Matches[1,1])
for(i in 2:nrow(Matches)) {
    Diffs[,i] = matures.cpm.spike.batch[, Matches[i,4]] - 
        matures.cpm.spike.batch[, Matches[i,2]]
    colnames(Diffs)[i] = sub("Control", "Diff", Matches[i,1])
}

## Result
    Diff_21   Diff_56   Diff_70
1 11.577717 11.285661 -4.166561
2 17.260808  6.847414 -3.377725
3  1.138959  5.448370  0.007922
4 17.390864  7.677556 -5.507284
5  6.197623  6.432683 -3.817149
6  9.417668 10.197726 -2.419097

Upvotes: 1

Related Questions