Washington Muniz
Washington Muniz

Reputation: 100

How can I simplify a correlation code in R?

This is my df:

                  date                     z         x                    y 
   <dttm>                               <dbl>    <dbl>                <dbl> 
 1 2019-01-01 00:00:00                   1333  3339072.         456700000000 
 2 2019-02-01 00:00:00                    915  4567582.         904600000000 
 3 2019-03-01 00:00:00                   1433  7887962.         247900000000 
 4 2019-04-01 00:00:00                   1444  3454559.         905700000000 
 5 2019-05-01 00:00:00                   1231  9082390.         245600000000 
 6 2019-06-01 00:00:00                    346   781224.         346700000000 

How can I simplify this code to a for loop?

df %>%
filter(year(df$date) == 2017) %>%
mutate(correlation = cor(x, y))

df %>%
filter(year(df$date) == 2018) %>%
mutate(correlation = cor(x, y))

df %>%
filter(year(df$date) == 2019) %>%
mutate(correlation = cor(x, y))

df %>%
filter(year(df$date) == 2020) %>%
mutate(correlation = cor(x, y))

That's what I tried so far, but I've got some NAs:

years <- c(2017, 2018, 2019, 2020)
for (y in years) {
  df %>%
    filter(date == y) %>%
    mutate(correlation = cor(x, y))
    print(df$correlation[y])
}

My desired output would be something like

[1] 2017: 0.23
[1] 2018: -0.38
[1] 2019: 0.40
[1] 2020: 0.15

Upvotes: 0

Views: 32

Answers (2)

Joe Marin
Joe Marin

Reputation: 61

In order to get the correlation by year you might want to be able to turn the dttm column into something that allows us to do equality by year. We can use the year function in lubridate for that, the code should work then.

library(lubridate)

df$year <- year(df$date)

for (y in unique(df$year)){
  df %>%
    filter(year == y) %>%
    mutate(correlation = cor(x, y)) %>%
    print(unique(correlation))
}

Alternatively we can be a little more succinct and do the following transformation with a group_by.

yearDf <- df %>% 
  group_by(year) %>%
  summarize(correlation = cor(x, y))

print(yearDf)

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

You can group_by year and calculate correlation for x and y in each year. Also since correlation would give you only one number for each year it is better to summarise instead of mutate because mutate would repeat the same value for all rows.

library(dplyr)
library(lubridate)

df %>% group_by(year = year(date)) %>% summarise(correlation = cor(x, y))

Upvotes: 1

Related Questions