Reputation: 651
I have a very large dataframe where pairs is the id, and each pair has a correlation. There are duplicate rows where the same pair might have a different correlation.
I want to plot corr1 versus corr2 for the same pair. So I thought I might have to make a new dataframe frist them plot corr1 verus corr2.
For example.
df1
pair rev_pair corr
1 A:B B:A 0.5
2 B:C C:B 0.9
3 C:D D:C 0.5
4 E:F F:E 1.0
5 A:B B:A 0.6
6 B:C C:B 1.0
df_pairs
pair corr1 corr2
A:B 0.5 0.6
B:C 0.9 1.0
Here is my attempted script. But it is taking forever. Does anyone have any suggestions.
Thank you,
i<-1
max_dup_pairs<-(length(Corr_dL[[i]]$corr)+length(Corr_sLdP[[i]]$corr)+length(Corr_sLsP[[i]]$corr))
dup_pairs<-matrix(nrow=max_dup_pairs,ncol=4)
counter<-1
for (j in 1:dim(ScoreCorr_list[[i]])[1]){ # for each row
for (k in (j+1):dim(ScoreCorr_list[[i]])[1]){ # compare to rows below
if (ScoreCorr_list[[i]][j,]$pair==ScoreCorr_list[[i]][k,]$pair){
dup_pairs[i,1]<-ScoreCorr_list[[i]][j,]$corr
dup_pairs[i,2]<-ScoreCorr_list[[i]][k,]$corr
dup_pairs[i,3]<-ScoreCorr_list[[i]][j,]$pair
dup_pairs[i,4]<-ScoreCorr_list[[i]][k,]$pair
}
}
}
ind <- apply(dup_pairs, 1, function(x) all(is.na(x)))
dup_pairs <- dup_pairs[ !ind, ]
pdf("Corr.duplicatePairsCorr.xy.pdf",width=6.07,height=3.75)
plot(dup_pairs[,1],dup_pairs[,2])
dev.off()
Upvotes: 0
Views: 75
Reputation: 887058
If you need to reshape df1
to df_pairs
library(reshape2)
df1$indx <- with(df1, paste0("corr", ave(seq_along(pair), pair, FUN=seq_along)) )
df_pairs <- dcast(df1, pair~indx, value.var="corr")
df_pairs[!rowSums(is.na(df_pairs[,-1])),]
# pair corr1 corr2
#1 A:B 0.5 0.6
#2 B:C 0.9 1.0
You may also use data.table
library(data.table)
key1 <- c("pair", "rev_pair")
DT <- data.table(df1, key=key1)
dcast(DT[DT[duplicated(DT)],][,N:= paste0("corr",1:.N), by= key1], pair~N, value.var="corr")
# pair corr1 corr2
#1 A:B 0.5 0.6
#2 B:C 0.9 1.0
Upvotes: 1
Reputation: 121568
You can use aggregate
after removing duplicated values.
dat_dup <-
dat[duplicated(dat[,1:2]) | duplicated(dat[,1:2],fromLast = TRUE) ,]
aggregate(corr~pair,data=dat_dup,FUN=I)
pair corr.1 corr.2
1 A:B 0.5 0.6
2 B:C 0.9 1.0
Upvotes: 1