Reputation: 33
I have a dataset like this:
Account_tenure_years = c(982,983,984,985,986,987,988)
N=c(12328,18990,21255,27996,32014,15487,4347)
Y=c(76,64,61,76,94,55,11)
df_table_account_tenure_vs_PPC = data.frame(Account_tenure_years,N,Y)
The dataset looks like this:
Account_tenure_years N Y
982 12328 76
983 18990 64
984 21255 61
985 27996 76
986 32014 94
987 15487 55
988 4347 11
What I want to do is this:
I want to find correlation
between any two of the Account_tenure_years
, example, 982,983
and find the correlation coefficient
with N
and Y columns
i.e I want to find the correlation coefficient
of the below table
Account_tenure_years N Y
982 12328 76
983 18990 64
Now I want to repeat this 8C2 times i.e 28 times. Taking different rows and finding the correlation coefficient
in each case.
i.e in the next iteration I would want :
Account_tenure_years N Y
983 18990 64
984 21255 61
And find its correlation coefficient
. Now after I have received all those 28 correlation coefficients
, I average
them out and find a mean
correlation coefficient
for the entire dataset.
How do I do this in R?
Ok lets get this straight if I find out the correlation coefficient between the columns
Account_tenure_years column, N
Also if I try to find out the correlation coefficient between the columns
Account_tenure_years column, Y
And if I find negative correlation coefficients in each case , can we infer anything from that?
Upvotes: 1
Views: 997
Reputation: 389
I do not understand how you want to compute correlation coefficients between two variables with only one observation for each. Therefore, I assume you have more rows than provided here.
First define all combinations:
combinations <- combn(df_table_account_tenure_vs_PPC$Account_tenure_years, 2)
For each combination, you want to extract the corresponding rows and compute the correlation coefficients for each variable:
coefficients <- apply(combinations, 2, function(x, df_table_account_tenure_vs_PPC){
coef <- sapply(c("N", "Y"), function(v, x, df_table_account_tenure_vs_PPC){
c <- cor(df_table_account_tenure_vs_PPC[df_table_account_tenure_vs_PPC == x[1], v], df_table_account_tenure_vs_PPC[df_table_account_tenure_vs_PPC == x[2], v])
return(c)},
x, df_table_account_tenure_vs_PPC)
return(c(x, coef))},
df_table_account_tenure_vs_PPC)
Then, you can aggregate your results in a data.frame:
df <- as.data.frame(t(coefficients))
colnames(df) <- c("Year1", "Year2", "N_cor", "Y_cor")
This should work. Please tell me if you have any problem. Again, make sure you have more than one observation in each condition if you want a meaningful correlation coefficient.
Upvotes: 0
Reputation: 3211
It is not an ideal way to calculate correlation coefficient
for each case. It should be calculated for the entire dataset:
Account_tenure_years = c(982,983,984,985,986,987,988)
N=c(12328,18990,21255,27996,32014,15487,4347)
Y=c(76,64,61,76,94,55,11)
df = data.frame(Account_tenure_years,N,Y)
cor(df$Account_tenure_years,df$N)
cor(df$Account_tenure_years,df$Y)
Output is as shown below:
> cor(df$Account_tenure_years,df$N)
[1] -0.1662244
> cor(df$Account_tenure_years,df$Y)
[1] -0.5332263
You can inferred that data is negatively correlated. It means increase in the value of Account_tenure_years
will decrease the value of N
and Y
or vice-versa.
Please feel free to correct me!
Upvotes: 1
Reputation: 2797
It should be easier to do this to transpose your data, And the best part is that you don't even need to write a loop.
try this:
dt <- data.table::fread("
Account_tenure_years N Y
982 12328 76
983 18990 64
984 21255 61
985 27996 76
986 32014 94
987 15487 55
988 4347 11
")
dt.t <- as.data.frame(t(dt[, 2:3]))
colnames(dt.t) = dt$Account_tenure_years
# transpose
dt.t
#> 982 983 984 985 986 987 988
#> N 12328 18990 21255 27996 32014 15487 4347
#> Y 76 64 61 76 94 55 11
# calculate correlation matrix, read more help(cor)
cor(dt.t)
#> 982 983 984 985 986 987 988
#> 982 1 1 1 1 1 1 1
#> 983 1 1 1 1 1 1 1
#> 984 1 1 1 1 1 1 1
#> 985 1 1 1 1 1 1 1
#> 986 1 1 1 1 1 1 1
#> 987 1 1 1 1 1 1 1
#> 988 1 1 1 1 1 1 1
Created on 2018-07-20 by the reprex package (v0.2.0.9000).
Upvotes: 0