Reputation: 79
Sample data:
A <- data.frame(region = c("US","US", "UK","UK","AUS","AUS"), a = c(1,2,3,4,5,8), b = c(4,5,6,7,8,2), c = c(9,6,5,43,2,5))
B <- data.frame(region = c("US","US", "UK","UK","AUS","AUS"),a = c(7,4,3,6,9,81), b = c(9,4,3,7,0,35), c = c(22,5,6,2,9,33))
Expected output:
(x is the correlation for the column between two data frames in the region)
I have tried: Binding two data frames into one and calculate correlation between two columns in one data frame. It is a bit tedious to type every column names, which also creates too many columns. Is there a simpler way to do this?
Upvotes: 1
Views: 208
Reputation: 7116
This is a little convoluted but it's an alternative way to do it.
library(tidyverse)
A <- data.frame(region = c("US","US", "UK","UK","AUS","AUS"), a = c(1,2,3,4,5,8), b = c(4,5,6,7,8,2), c = c(9,6,5,43,2,5))
B <- data.frame(region = c("US","US", "UK","UK","AUS","AUS"),a = c(7,4,3,6,9,81), b = c(9,4,3,7,0,35), c = c(22,5,6,2,9,33))
(df <- map(list(A, B), ~nest_by(.x, region)) %>%
reduce(inner_join, by = 'region'))
#> # A tibble: 3 × 3
#> # Rowwise: region
#> region data.x data.y
#> <chr> <list<tibble[,3]>> <list<tibble[,3]>>
#> 1 AUS [2 × 3] [2 × 3]
#> 2 UK [2 × 3] [2 × 3]
#> 3 US [2 × 3] [2 × 3]
bind_cols(select(df, region), map2_dfr(df$data.x, df$data.y, ~map2_dfc(.x, .y, ~cor(.x, .y))))
#> # A tibble: 3 × 4
#> # Rowwise: region
#> region a b c
#> <chr> <dbl> <dbl> <dbl>
#> 1 AUS 1 -1 1
#> 2 UK 1 1 -1
#> 3 US -1 -1 1
Created on 2022-01-06 by the reprex package (v2.0.1)
Upvotes: 0
Reputation: 8844
If my understanding is not off, then here is a solution using dplyr
and tidyr
.
library(dplyr)
library(tidyr)
rbind(cbind(set = "A", A), cbind(set = "B", B)) %>%
pivot_longer(-c(set, region)) %>%
group_by(region, name) %>%
summarise(value = cor(value[set == "A"], value[set == "B"]), .groups = "drop") %>%
pivot_wider()
Output
# A tibble: 3 x 4
region a b c
<chr> <dbl> <dbl> <dbl>
1 AUS 1 -1 1
2 UK 1 1 -1
3 US -1 -1 1
Upvotes: 2