Egon Carter
Egon Carter

Reputation: 39

How to calculate the correlation of 2 variables for every nth rows in a data frame in r?

I have a data frame of 200*1000 rows and 6 columns. I want to calculate the correlation between 2 columns cor(df$y1, df$y2)) for every 200 rows, so that I get 1000 different correlation values as a result. When I wanted to calculate the sums of every 200 rows I could simply use

rowsum(df,rep(1:1000,each=200))

but there is no such command in r as rowcor that I could use equivalently for correlations.

Upvotes: 1

Views: 107

Answers (1)

akrun
akrun

Reputation: 886938

We may use a group by approach

by(df[c('y1', 'y2')], as.integer(gl(nrow(df), 200, nrow(df))),
      FUN = function(x) cor(x$y1, x$y2))

Or using tidyverse

library(dplyr)
out <- df %>%
   group_by(grp = as.integer(gl(n(), 200, n()))) %>%
   summarise(Cor = cor(y1, y2))
> dim(out)
[1] 1000    2

data

set.seed(24)
df <- as.data.frame(matrix(rnorm(200 *1000 * 6), ncol = 6))
names(df)[1:2] <- c('y1', 'y2')

Upvotes: 2

Related Questions