Reputation: 2385
I have a data frame with 192 columns. I want to make a 96 column data frame by subtracting matching column based in pairing. The pairing info is found in the data frame Pairing
in the column Match
. The column Pos
matches as substring the column names of my data frame that I want to subtract.
How do I use the pairing info in Pairing
to identify which column to subtract.
> Pairing
Match Pos
Control_70 001_A01
Control_56 001_A02
Case_70 001_A03
Case_56 001_A04
Control_21 001_A05
Case_21 001_A06
> head(matures.cpm.spike.batch[,1:6])
001_A01_S1 001_A02_S2 001_A03_S3 001_A04_S4 001_A05_S5 001_A06_S6
hsa-let-7a-5p 16.566813 11.415796 12.400252 22.701457 8.864882 20.442599
hsa-let-7b-5p 15.574190 11.107133 12.196465 17.954547 8.527478 25.788286
hsa-let-7c-5p 5.976763 4.372978 5.984685 9.821348 6.341252 7.480211
hsa-let-7d-3p 16.508818 10.697730 11.001534 18.375286 7.583910 24.974774
hsa-let-7d-5p 13.273824 5.134547 9.456675 11.567230 7.096485 13.294108
hsa-let-7f-5p 13.900711 9.804384 11.481614 20.002110 7.878241 17.295909
Upvotes: 0
Views: 106
Reputation: 2718
Just in case, a different approach:
We are going to need a modified matching data frame, with separate columns for cases and controls:
library(tidyr)
library(reshape2)
P <- Pairing %>%
separate(Match, into = c("cc", "ind"), sep = "_") %>%
dcast(ind ~ cc, value.var = "Pos")
P:
ind Case Control
1 21 001_A06 001_A05
2 56 001_A04 001_A02
3 70 001_A03 001_A01
We also want colnames
in matures.cpm.spike.batch
to match names in P
:
df <- matures.cpm.spike.batch
colnames(df) <- gsub("^X|_S.*", "", colnames(df))
colnames(df):
[1] "001_A01" "001_A02" "001_A03" "001_A04" "001_A05" "001_A06"
Now we can finish it simply as following:
case <- df[, P$Case]
control <- df[, P$Control]
res <- case - control
res:
001_A06 001_A04 001_A03
hsa-let-7a-5p 11.577717 11.285661 -4.166561
hsa-let-7b-5p 17.260808 6.847414 -3.377725
hsa-let-7c-5p 1.138959 5.448370 0.007922
hsa-let-7d-3p 17.390864 7.677556 -5.507284
hsa-let-7d-5p 6.197623 6.432683 -3.817149
hsa-let-7f-5p 9.417668 10.197726 -2.419097
Upvotes: 0
Reputation: 37641
I am assuming that for every Case there is exactly one Control and vice versa. It seemed easiest to transform your Pairing data.frame to align the Case and the Control. Once that is done, you can just build the data frame you want.
## First, recreate your data
Pairing = read.table(text="Match Pos
Control_70 001_A01
Control_56 001_A02
Case_70 001_A03
Case_56 001_A04
Control_21 001_A05
Case_21 001_A06",
header=TRUE)
matures.cpm.spike.batch = read.table(text=" 001_A01_S1 001_A02_S2 001_A03_S3 001_A04_S4 001_A05_S5 001_A06_S6
hsa-let-7a-5p 16.566813 11.415796 12.400252 22.701457 8.864882 20.442599
hsa-let-7b-5p 15.574190 11.107133 12.196465 17.954547 8.527478 25.788286
hsa-let-7c-5p 5.976763 4.372978 5.984685 9.821348 6.341252 7.480211
hsa-let-7d-3p 16.508818 10.697730 11.001534 18.375286 7.583910 24.974774
hsa-let-7d-5p 13.273824 5.134547 9.456675 11.567230 7.096485 13.294108
hsa-let-7f-5p 13.900711 9.804384 11.481614 20.002110 7.878241 17.295909",
header=TRUE)
## Build Matches to replace your Pairing
Control = Pairing[grep("Control", Pairing$Match),]
Control = Control[order(Control$Match),]
Case = Pairing[grep("Case", Pairing$Match),]
Case = Case[order(Case$Match),]
Matches = cbind(Control, Case)
# Uses Matches to build desired data.frame
Diffs = data.frame(matures.cpm.spike.batch[, Matches[1,4]] -
matures.cpm.spike.batch[, Matches[1,2]])
colnames(Diffs)[1] = sub("Control", "Diff", Matches[1,1])
for(i in 2:nrow(Matches)) {
Diffs[,i] = matures.cpm.spike.batch[, Matches[i,4]] -
matures.cpm.spike.batch[, Matches[i,2]]
colnames(Diffs)[i] = sub("Control", "Diff", Matches[i,1])
}
## Result
Diff_21 Diff_56 Diff_70
1 11.577717 11.285661 -4.166561
2 17.260808 6.847414 -3.377725
3 1.138959 5.448370 0.007922
4 17.390864 7.677556 -5.507284
5 6.197623 6.432683 -3.817149
6 9.417668 10.197726 -2.419097
Upvotes: 1