T-T
T-T

Reputation: 713

R: Correlation calculation based on two conditions

I need to calculate the correlation for my data set based on two conditions. Below is an example:

df1 <- data.frame(
  Main = c(0.0089, -0.050667, -0.030379, 0.066484, 0.006439, -0.026076), 
  B = c(NA, 0.0345, -0.0683, -0.052774, 0.014661, -0.040537), 
  C = c(0.0181, 0, -0.056197, 0.040794, 0.03516, -0.022662), 
  D = c(-0.0127, -0.025995, -0.04293, 0.057816, 0.033458, -0.058382)
)
df1
#    Main       B          C          D
# 1  0.008900   NA         0.018100  -0.012700
# 2 -0.050667   0.034500   0.000000  -0.025995
# 3 -0.030379  -0.068300  -0.056197  -0.042930
# 4  0.066484  -0.052774   0.040794   0.057816
# 5  0.006439   0.014661   0.035160   0.033458
# 6 -0.026076  -0.040537  -0.022662  -0.058382

I want to get the correlation between Main and each of the columns when 1) Main<0 and 2) when 1) is met, columns B, C, or D do not equals 0. In this example, for Main and B, row #2, 3 & 6 (df1) fit the rules; for Main and C, row #3 & 6 fit; for Main and D, row #2, 3 & 6 fit.

In Excel, this can be done by using two if functions. For example, to calculate the negative correlation between Main and B, I can use {=CORREL(IF(A1:A6<0, A1:A6), IF(A1:A6<0, IF(B1:B6<>0, B1:B6)))}

My actual data set has close to 20 columns like this and will change from time to time. I'm pretty sure R can do it but I've been stuck with this problem for hours. Any advice will be appreciated.

Upvotes: 1

Views: 621

Answers (2)

user31264
user31264

Reputation: 6727

a=as.matrix(df1)
ind=(a[,1]<0)
a[a==0]=NA
cor(a[ind,1],a[ind,-1],use="pairwise")

Upvotes: 2

MichaelChirico
MichaelChirico

Reputation: 34703

Using data.table:

library(data.table)
setDT(df1)
df1[Main < 0 & Reduce(`&`, lapply(df1[ , !"Main", with = FALSE], `!=`, 0)), 
    cor(.SD)]
#      Main  B  C  D
# Main    1  1  1 -1
# B       1  1  1 -1
# C       1  1  1 -1
# D      -1 -1 -1  1

(Obviously since the subset is only two points, everything's perfectly correlated -- two points define a unique line)

If you install the development version of data.table, this can be simplified slightly to:

df1[Main < 0 & Reduce(`&`, lapply(df1[ , !"Main"], `!=`, 0)), 
    cor(.SD)]

Upvotes: 0

Related Questions