Reputation: 100
This is my df:
date z x y
<dttm> <dbl> <dbl> <dbl>
1 2019-01-01 00:00:00 1333 3339072. 456700000000
2 2019-02-01 00:00:00 915 4567582. 904600000000
3 2019-03-01 00:00:00 1433 7887962. 247900000000
4 2019-04-01 00:00:00 1444 3454559. 905700000000
5 2019-05-01 00:00:00 1231 9082390. 245600000000
6 2019-06-01 00:00:00 346 781224. 346700000000
How can I simplify this code to a for loop?
df %>%
filter(year(df$date) == 2017) %>%
mutate(correlation = cor(x, y))
df %>%
filter(year(df$date) == 2018) %>%
mutate(correlation = cor(x, y))
df %>%
filter(year(df$date) == 2019) %>%
mutate(correlation = cor(x, y))
df %>%
filter(year(df$date) == 2020) %>%
mutate(correlation = cor(x, y))
That's what I tried so far, but I've got some NAs:
years <- c(2017, 2018, 2019, 2020)
for (y in years) {
df %>%
filter(date == y) %>%
mutate(correlation = cor(x, y))
print(df$correlation[y])
}
My desired output would be something like
[1] 2017: 0.23
[1] 2018: -0.38
[1] 2019: 0.40
[1] 2020: 0.15
Upvotes: 0
Views: 32
Reputation: 61
In order to get the correlation by year you might want to be able to turn the dttm column into something that allows us to do equality by year. We can use the year function in lubridate for that, the code should work then.
library(lubridate)
df$year <- year(df$date)
for (y in unique(df$year)){
df %>%
filter(year == y) %>%
mutate(correlation = cor(x, y)) %>%
print(unique(correlation))
}
Alternatively we can be a little more succinct and do the following transformation with a group_by.
yearDf <- df %>%
group_by(year) %>%
summarize(correlation = cor(x, y))
print(yearDf)
Upvotes: 2
Reputation: 388982
You can group_by
year
and calculate correlation for x
and y
in each year
. Also since correlation would give you only one number for each year
it is better to summarise
instead of mutate
because mutate
would repeat the same value for all rows.
library(dplyr)
library(lubridate)
df %>% group_by(year = year(date)) %>% summarise(correlation = cor(x, y))
Upvotes: 1