Reputation: 713
I need to calculate the correlation for my data set based on two conditions. Below is an example:
df1 <- data.frame(
Main = c(0.0089, -0.050667, -0.030379, 0.066484, 0.006439, -0.026076),
B = c(NA, 0.0345, -0.0683, -0.052774, 0.014661, -0.040537),
C = c(0.0181, 0, -0.056197, 0.040794, 0.03516, -0.022662),
D = c(-0.0127, -0.025995, -0.04293, 0.057816, 0.033458, -0.058382)
)
df1
# Main B C D
# 1 0.008900 NA 0.018100 -0.012700
# 2 -0.050667 0.034500 0.000000 -0.025995
# 3 -0.030379 -0.068300 -0.056197 -0.042930
# 4 0.066484 -0.052774 0.040794 0.057816
# 5 0.006439 0.014661 0.035160 0.033458
# 6 -0.026076 -0.040537 -0.022662 -0.058382
I want to get the correlation between Main
and each of the columns when 1) Main<0
and 2) when 1) is met, columns B
, C
, or D
do not equals 0
. In this example, for Main and B, row #2, 3 & 6 (df1) fit the rules; for Main and C, row #3 & 6 fit; for Main and D, row #2, 3 & 6 fit.
In Excel, this can be done by using two if
functions. For example, to calculate the negative correlation between Main
and B
, I can use {=CORREL(IF(A1:A6<0, A1:A6), IF(A1:A6<0, IF(B1:B6<>0, B1:B6)))}
My actual data set has close to 20 columns like this and will change from time to time. I'm pretty sure R can do it but I've been stuck with this problem for hours. Any advice will be appreciated.
Upvotes: 1
Views: 621
Reputation: 6727
a=as.matrix(df1)
ind=(a[,1]<0)
a[a==0]=NA
cor(a[ind,1],a[ind,-1],use="pairwise")
Upvotes: 2
Reputation: 34703
Using data.table
:
library(data.table)
setDT(df1)
df1[Main < 0 & Reduce(`&`, lapply(df1[ , !"Main", with = FALSE], `!=`, 0)),
cor(.SD)]
# Main B C D
# Main 1 1 1 -1
# B 1 1 1 -1
# C 1 1 1 -1
# D -1 -1 -1 1
(Obviously since the subset is only two points, everything's perfectly correlated -- two points define a unique line)
If you install the development version of data.table
, this can be simplified slightly to:
df1[Main < 0 & Reduce(`&`, lapply(df1[ , !"Main"], `!=`, 0)),
cor(.SD)]
Upvotes: 0