Reputation: 10996
I want to get a correlation matrix across several variables in tidyverse. However, I want to do this grouped by another column. E.g. suppose I have a data frame df
with columns year
and I want to see correlations across V1
, V2
, V3
by year.
year V1 V2 V3 misc_var
2018 5 6 5 a
2018 4 6 4 b
2018 3 2 3 NA
2013 5 8 2 4
2013 6 3 8 8
2013 4 7 5 NA
I tried sth. along the lines of
cor_output = df %>%
group_by(year) %>%
select(V1, V2, V3, year) %>%
cor(use = "pairwise.complete.obs")
However, instead of calculating the correlations from V1 to V3 for each year, it just adds the year
variable to the correlations.
The desired output should look like (please note the correlations in the output are made up)
year var V1 V2 V3
2013 V1 1 0.7 0.3
2013 V2 ... 1 ...
...
...
2018 V2 0.6 1 0.7
...
Any thoughts?
Upvotes: 5
Views: 2315
Reputation: 410
In general terms:
dataframe %>%
select(grouping_variable, columns) %>%
group_by(grouping_variable) %>%
group_modify(~ corrr::correlate(.x))
where columns
may be c(col_1, col_2, ...)
or col_1:col_10
Upvotes: 1
Reputation: 20463
One approach is to use the corrr
package in conjunction with a purrr::nest()
:
library(tidyverse)
library(corrr)
df <- tribble(
~year, ~V1, ~V2, ~V3, ~misc_var,
2018, 5, 6, 5, "a",
2018, 4, 6, 4, "b",
2018, 3, 2, 3, NA,
2013, 5, 8, 2, "4",
2013, 6, 3, 8, "8",
2013, 4, 7, 5, NA
)
df %>%
select_if(is.numeric) %>%
group_by(year) %>%
nest() %>%
mutate(
correlations = map(data, correlate)
) %>%
unnest(correlations)
#>
#> Correlation method: 'pearson'
#> Missing treated using: 'pairwise.complete.obs'
#>
#>
#> Correlation method: 'pearson'
#> Missing treated using: 'pairwise.complete.obs'
#> # A tibble: 6 x 5
#> year rowname V1 V2 V3
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2018 V1 NA 0.866 1
#> 2 2018 V2 0.866 NA 0.866
#> 3 2018 V3 1 0.866 NA
#> 4 2013 V1 NA -0.756 0.5
#> 5 2013 V2 -0.756 NA -0.945
#> 6 2013 V3 0.5 -0.945 NA
Alternatively, you could use the more experimental group_map
or group_modify
functions from dplyr
:
df %>%
select_if(is.numeric) %>%
group_by(year) %>%
group_map(~ correlate(.x)) # or group_modify(~ correlate(.x))
Upvotes: 4