Shawn Hemelstrand
Shawn Hemelstrand

Reputation: 3228

Is there a package/command in R for restricted range correlations?

Dput for data frame:

structure(list(aptitude = c(78, 85, 69, 80, 60, 72, 77, 65, 70, 
80, 75, 83, 81, 65, 77, 76, 64, 68, 74, 85, 83, 80, 62, 69, 66, 
75, 68, 70), performance = c(74, 59, 59, 60, 55, 62, 59, 64, 
50, 64, 60, 59, 51, 64, 58, 49, 43, 62, 49, 59, 59, 60, 43, 62, 
49, 64, 38, 74)), class = "data.frame", row.names = c(NA, -28L
))

I have run a correlation on this dataset using the following command:

# Run correlation of apt and perform:
hw %>% 
  correlation() # r = .28, p value = .145

However, the aptitude variable has a cutoff of 60, or in other words, the minimum value of aptitude is 60 and there can be no scores below it. With this being the case, I am trying to correct the correlation to include this in some way.

I tried looking for packages/commands in R that have this range restriction, but I'm having issues finding anything that matches this. RDocumentation lists rCCr and rangeCorrection but they don't seem to be available anymore from what I can gather.

Any help would be great!

Upvotes: 0

Views: 138

Answers (1)

user2974951
user2974951

Reputation: 10375

Your data distribution does not matter in computing a correlation coefficient. If one sample is distributed from [0, 100], while another is in [0,inf] or [100,200], or some other range, this won't affect the coefficient.

Maybe it would be easier to demonstrate with an example, some made-up data. Y and X both in the range [1,100].

y=rnorm(100)+seq(1,100,1)
x=rnorm(100)+seq(1,100,1)
plot(y~x)
cor(y,x)
[1] 0.9988158

The relationship is very linear and has a very high Pearson correlation. Now try transforming one of the variables, for ex. Y such that it has range from [100,200] while keeping the other as is.

cor(y+100,x)
[1] 0.9988158

It makes no difference. Why? Because you are just adding a constant to a random variable, which does not affect the variance of this variable, i.e. Var(a+Y) = Var(Y), which is what you are using when estimating a correlation coefficient.

Upvotes: 1

Related Questions