C. Gupta
C. Gupta

Reputation: 33

How to find correlation coefficients in a loop?

I have a dataset like this:

Account_tenure_years = c(982,983,984,985,986,987,988)
N=c(12328,18990,21255,27996,32014,15487,4347)
Y=c(76,64,61,76,94,55,11)
df_table_account_tenure_vs_PPC = data.frame(Account_tenure_years,N,Y)

The dataset looks like this:

Account_tenure_years   N     Y
982                  12328  76
983                  18990  64
984                  21255  61
985                  27996  76
986                  32014  94
987                  15487  55
988                   4347  11

What I want to do is this:

I want to find correlation between any two of the Account_tenure_years, example, 982,983 and find the correlation coefficient with N and Y columns i.e I want to find the correlation coefficient of the below table

  Account_tenure_years   N     Y
  982                  12328  76
  983                  18990  64

Now I want to repeat this 8C2 times i.e 28 times. Taking different rows and finding the correlation coefficient in each case. i.e in the next iteration I would want :

 Account_tenure_years   N     Y
  983                  18990  64
  984                  21255  61 

And find its correlation coefficient. Now after I have received all those 28 correlation coefficients, I average them out and find a mean correlation coefficient for the entire dataset.

How do I do this in R?

Ok lets get this straight if I find out the correlation coefficient between the columns

Account_tenure_years column, N 

Also if I try to find out the correlation coefficient between the columns

Account_tenure_years column, Y

And if I find negative correlation coefficients in each case , can we infer anything from that?

Upvotes: 1

Views: 997

Answers (3)

Hobo Sheep
Hobo Sheep

Reputation: 389

I do not understand how you want to compute correlation coefficients between two variables with only one observation for each. Therefore, I assume you have more rows than provided here.

First define all combinations:

combinations <- combn(df_table_account_tenure_vs_PPC$Account_tenure_years, 2)

For each combination, you want to extract the corresponding rows and compute the correlation coefficients for each variable:

coefficients <- apply(combinations, 2, function(x, df_table_account_tenure_vs_PPC){
    coef <- sapply(c("N", "Y"), function(v, x, df_table_account_tenure_vs_PPC){
        c <- cor(df_table_account_tenure_vs_PPC[df_table_account_tenure_vs_PPC == x[1], v], df_table_account_tenure_vs_PPC[df_table_account_tenure_vs_PPC == x[2], v])
        return(c)},
    x, df_table_account_tenure_vs_PPC)
    return(c(x, coef))},
df_table_account_tenure_vs_PPC)

Then, you can aggregate your results in a data.frame:

df <- as.data.frame(t(coefficients))
colnames(df) <- c("Year1", "Year2", "N_cor", "Y_cor")

This should work. Please tell me if you have any problem. Again, make sure you have more than one observation in each condition if you want a meaningful correlation coefficient.

Upvotes: 0

Saurabh Chauhan
Saurabh Chauhan

Reputation: 3211

It is not an ideal way to calculate correlation coefficient for each case. It should be calculated for the entire dataset:

Account_tenure_years = c(982,983,984,985,986,987,988)
N=c(12328,18990,21255,27996,32014,15487,4347)
Y=c(76,64,61,76,94,55,11)
df = data.frame(Account_tenure_years,N,Y)

cor(df$Account_tenure_years,df$N)
cor(df$Account_tenure_years,df$Y)

Output is as shown below:

> cor(df$Account_tenure_years,df$N)
[1] -0.1662244
> cor(df$Account_tenure_years,df$Y)
[1] -0.5332263

You can inferred that data is negatively correlated. It means increase in the value of Account_tenure_years will decrease the value of N and Y or vice-versa.

Please feel free to correct me!

Upvotes: 1

TC Zhang
TC Zhang

Reputation: 2797

It should be easier to do this to transpose your data, And the best part is that you don't even need to write a loop.

try this:

dt <- data.table::fread("
Account_tenure_years   N     Y
982                  12328  76
983                  18990  64
984                  21255  61
985                  27996  76
986                  32014  94
987                  15487  55
988                   4347  11
")


dt.t <- as.data.frame(t(dt[, 2:3]))

colnames(dt.t) = dt$Account_tenure_years
# transpose
dt.t
#>     982   983   984   985   986   987  988
#> N 12328 18990 21255 27996 32014 15487 4347
#> Y    76    64    61    76    94    55   11

# calculate correlation matrix, read more help(cor)
cor(dt.t)
#>     982 983 984 985 986 987 988
#> 982   1   1   1   1   1   1   1
#> 983   1   1   1   1   1   1   1
#> 984   1   1   1   1   1   1   1
#> 985   1   1   1   1   1   1   1
#> 986   1   1   1   1   1   1   1
#> 987   1   1   1   1   1   1   1
#> 988   1   1   1   1   1   1   1

Created on 2018-07-20 by the reprex package (v0.2.0.9000).

Upvotes: 0

Related Questions