LeMarque
LeMarque

Reputation: 783

unable to find group by year and month wise correlation of the variables and store in either a list of results or separate dataframe

I have a daatframe df, which contains the variables: Year, Month, and 4 more vars as shown below:

structure(list(Year = c(2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2019L, 
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2020L, 2020L, 2020L, 
2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 
2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 
2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 
2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 
2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 
2020L, 2020L, 2020L, 2020L, 2021L, 2021L, 2021L, 2021L, 2021L, 
2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
2021L, 2021L, 2021L, 2021L), Month = c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 
6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 
9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 
12L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 
8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L, 
11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 
9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 
12L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 
7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 
11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), `NC 1234(BAR)` = c(86.68, 
81.52, 82.53, 84.17, 82.52, 79.11, 77.2, 81.13, 81.56, 78.6, 
80.8, 81.09, 82.45, 88.19, 83.83, 82.77, 79.83, 77.42, 72.42, 
73.45, 73.66, 74.36, 80.79, 83.66, 84.18, 82.46, 82.62, 84.03, 
91.04, 92.28, 97, 95.4, 100.1, 101.83, 98.16, 98.15, 99.44, 95.53, 
95.37, 97.88, 94.85, 98.25, 97.46, 99.31, 98.66, 95.21, 93.36, 
96.84, 96.78, 102.59, 103.97, 103.88, 105.25, 106.27, 106.88, 
107.42, 107.14, 102.47, 103.43, 110.6, 103.6, 100.02, 95.2, 92.41, 
90.68, 92.43, 93.81, 93.26, 95.25, 101.35, 101.64, 107.75, 109.82, 
108.89, 113.3, 118.09, 116.27, 117.26, 117.58, 118.49, 122.57, 
120.7, 120.41, 118.29, 117.68, 117.39, 118.73, 114.48, 115.74, 
113.01, 113.58, 113.98, 108.81, 107.05, 108.23, 103.08, 103.66, 
103.59, 98.76, 99.95, 101.95, 102.4, 100.56, 102.55, 99.54, 97.61, 
99.3, 99.78, 99.5, 96.92, 90.71, 94.64, 96.04, 96.7, 93.25, 90.51, 
89.17, 78.3, 85.93, 89.34, 87.72, 86.91, 85.86, 83.24, 81.06, 
78.71, 71.39, 72.72, 71.53, 68.81, 72.48, 76.51, 73.98, 72.88, 
67.09, 68.88, 64.85, 62.5, 62.32, 65.38, 65.42, 65.94, 64.99, 
65.71, 66.97, 67.43, 68.34, 68.22, 66.78, 67.28, 68.95, 67.62, 
67.33, 66.31, 66.5, 66.19, 64.8, 68.67, 73.78, 68.45, 66.83, 
66.86, 69.32, 68.21, 66.77, 65.99, 65.42, 66.57, 68.37, 68.33, 
63.54, 58.59, 53.74, 50.76, 51.71, 50.83, 53.98, 54.9, 55.25, 
52.75, 50.48, 51.17, 51.58, 52.72, 52.48, 51.72, 50.73, 51.42, 
49.52, 50.19, 47.99, 48.17, 49.85, 53.66, 59.97, 58.21, 57.73, 
54.42, 60.09, 60.35, 62.45, 62.53, 62.82, 68.22, 72.31, 77.91, 
83.96, 85.31, 83.72, 80.78, 87.98, 87.92, 90.46, 85.65, 88.4, 
85.79, 82.16, 90.84, 90.3, 92.54, 103.95, 99.38, 92.7, 91.75, 
90.89, 91.71), `TPI 2 1300(BAR)` = c(86.34, 88.75, 90.78, 88.04, 
83.64, 82.25, 80.36, 82.78, 78.15, 74.46, 73.23, 71.04, 69.87, 
77.56, 75.23, 74.85, 74.06, 73.66, 72.45, 74.6, 77.4, 77.37, 
78.99, 78.87, 80.26, 80.35, 83.83, 84.41, 83.04, 82.48, 83.27, 
84.38, 86.28, 87.34, 88.35, 89.36, 92.84, 94.2, 92.02, 91.99, 
90.33, 93.2, 92.55, 94.94, 94.69, 94.14, 92.04, 91.9, 93.28, 
95.94, 95.57, 95.83, 96.2, 95.76, 95.14, 93.91, 90.5, 84.11, 
81.86, 86.61, 80.74, 80.89, 80.93, 78.46, 76.25, 80.09, 82.18, 
81.75, 83.19, 85.47, 88.22, 92.79, 91.38, 94.8, 95.79, 97.33, 
97.26, 96.99, 99.93, 102.71, 101.8, 98.8, 94.56, 96.51, 99.43, 
98.16, 99.56, 100.64, 100.45, 99.38, 101.23, 101.45, 100.01, 
98.93, 100.74, 97.66, 92.71, 88.39, 81.37, 82.3, 86.07, 88.1, 
89.46, 86.18, 84.34, 81.21, 81.88, 79.77, 79.13, 74.25, 71.47, 
72.87, 74.9, 73.64, 70.3, 67.29, 61.74, 54.62, 60.16, 59.64, 
58.82, 58.1, 59.35, 58.2, 55.48, 51.12, 50.55, 49.65, 47.79, 
47.68, 53.54, 59.03, 59.34, 61.55, 58.65, 55.44, 54.95, 53.43, 
48.68, 56.82, 62.25, 62.89, 59.49, 59.82, 60.94, 60.41, 57.99, 
56.22, 56.29, 54.7, 55.72, 57.4, 55.58, 53.06, 52.73, 53.16, 
52.09, 52.81, 51.05, 47.67, 48.35, 48.78, 49.23, 48.99, 46.1, 
47.25, 45.68, 48.03, 50.63, 48.1, 45.64, 44.61, 41.46, 38.26, 
39.33, 38.26, 38.8, 38.27, 45.99, 44.91, 44.64, 46.82, 50.68, 
50.04, 48.73, 49.94, 50.15, 51.45, 49.68, 47.58, 47.42, 51.78, 
51.61, 52.55, 53.32, 57.34, 57.77, 56.17, 56.39, 54.19, 51.41, 
52.28, 53.98, 57.48, 62.02, 64.56, 68.58, 69.71, 68.99, 65.75, 
70.31, 66.44, 67.51, 64.86, 68.52, 64.48, 65.7, 66.35, 65.96, 
69.21, 72.25, 69.9, 68.38, 70.8, 73.92, 75.79), `TPI 4 1300(BAR)` = c(82, 
84.25, 89.48, 88.25, 85.97, 83.48, 81.23, 82.36, 83.24, 80.1, 
78.22, 76.19, 75.87, 80.31, 77.92, 76.95, 74.14, 74.2, 71.3, 
71.98, 72.2, 74.56, 79.84, 79.04, 77.41, 77.16, 82.04, 82.51, 
82.36, 79.45, 85.16, 86.65, 87.43, 87.46, 89.36, 89.86, 94.31, 
93.33, 90.76, 90.73, 90.13, 91.5, 91.61, 94.13, 92.67, 89.73, 
88.64, 94.41, 94.5, 95.58, 95.09, 94.48, 95.81, 97.29, 97.77, 
98.36, 96.27, 90.1, 93.41, 98.34, 94.42, 92.24, 90.81, 90.2, 
89.32, 91.75, 94.34, 95.37, 95.09, 98.17, 102.66, 104.7, 103.64, 
103.98, 105.38, 105.81, 103.34, 103.79, 107.84, 108.95, 108.19, 
106.57, 103.43, 101.62, 98.16, 95.66, 94.91, 97.35, 100.3, 97.33, 
98.01, 100.75, 99.55, 98.52, 98.99, 98.38, 96.69, 91.65, 85.98, 
89.27, 93.37, 95.42, 97.69, 96.87, 94.07, 90.23, 90.88, 89.33, 
88.72, 84.06, 77.46, 78.79, 82.3, 80.44, 75.19, 71.33, 67.96, 
65.52, 70.74, 69.72, 68.32, 70.42, 71.54, 66.47, 62.08, 60.18, 
62.1, 63.98, 62.19, 64.32, 65.41, 65.85, 65.36, 66.1, 63.06, 
59.62, 59.54, 57.89, 56.25, 58.97, 59.87, 60.69, 58.73, 62.48, 
68.18, 69.58, 70.18, 67.38, 70.55, 73.48, 81.09, 86.66, 82.2, 
82.33, 84.09, 87.48, 77.37, 88.17, 92.28, 85.54, 87.73, 86.9, 
85.51, 81.65, 76.05, 70.4, 62.12, 61.76, 66.71, 71.04, 59.66, 
51.58, 44.56, 49.49, 51.87, 53.15, 54.88, 55.26, 59.22, 55.3, 
54.2, 50.09, 53.13, 53.6, 54.8, 53.34, 52.27, 54.01, 54.19, 56.4, 
55.78, 57.08, 55.83, 56.21, 58.98, 61.7, 59.93, 57.18, 57.37, 
59.76, 61.34, 67.09, 69.74, 77.7, 81.86, 84.32, 97.91, 97.32, 
91.24, 87.68, 95.43, 90.13, 87.67, 83.9, 90.14, 89.31, 85.65, 
90.91, 91.61, 96.39, 100.02, 96.7, 96.03, 89.93, 90.58, 93.07
), `TCC 1200(BAR)` = c(37.25, 37.76, 39.13, 41, 42.69, 43.31, 
43.86, 43.97, 44.6, 45.11, 45.56, 45.72, 46.01, 45.47, 44.69, 
42.19, 39.28, 38.11, 37.2, 36.2, 36.7, 37.3, 39.11, 41.59, 42.04, 
41.65, 40.91, 40.8, 41.85, 42.28, 42.55, 42.52, 42.41, 42.75, 
43.02, 43.85, 44.75, 46.03, 46.35, 46.37, 46.41, 46.8, 46.44, 
46.09, 44.88, 44.16, 43.99, 44.2, 45.38, 46.09, 46.34, 46.81, 
47.31, 48.02, 48.59, 48.84, 49.58, 50.16, 50.32, 51.04, 49.98, 
48.45, 46.12, 43.55, 42.19, 41.48, 41.08, 41.13, 41.85, 42.79, 
44.02, 45.55, 45.85, 46.82, 47.74, 48.08, 48.71, 47.82, 47.63, 
47.64, 47.48, 45.06, 42, 40.48, 39.66, 38.09, 37.72, 37.96, 38.39, 
38.72, 38.56, 38.88, 38.43, 38.13, 37.31, 36.07, 34.64, 32.29, 
29.87, 28.85, 29.56, 30.43, 30.54, 30.29, 30.26, 31, 32.3, 33.51, 
34.37, 34.45, 35.4, 37.01, 38.48, 40.32, 39.76, 37.8, 36.75, 
37.3, 37.59, 38.23, 38.44, 39, 39.15, 38.78, 38, 37.36, 36.66, 
35.84, 36.06, 35.81, 35.99, 35.89, 34.71, 33.7, 33.32, 32.34, 
31.45, 30.96, 31.22, 31.57, 32.17, 32.69, 33.1, 33.19, 33.65, 
34.36, 35.1, 35.06, 34.51, 33.88, 33.52, 33.71, 34.43, 34.52, 
34.16, 33.83, 33.77, 34.27, 35.08, 35.44, 35.48, 36.4, 36.67, 
35.92, 34.82, 33.79, 32.77, 31.84, 31.5, 30.83, 30.34, 28.23, 
26.03, 24.24, 24.15, 25.71, 27.98, 29.37, 27.66, 25.38, 24.34, 
23.89, 23.56, 23.39, 24.24, 25.36, 25.48, 25.12, 24.05, 23.57, 
22.86, 22.63, 22.65, 23.57, 25.13, 26.03, 27.02, 29.13, 29.47, 
28.91, 28.93, 28.94, 29.88, 31.78, 33.47, 36.11, 41.11, 44.8, 
44.9, 45.56, 47.78, 49.95, 46.69, 39.51, 38.41, 39.57, 39.4, 
38.67, 37.45, 36.81, 39.2, 43.9, 45.44, 47.71, 50.35, 52.19)), row.names = c(NA, 
-226L), class = c("tbl_df", "tbl", "data.frame"))

Below is the code I am using:

library(readr)
library(tsibble)
library(dplyr)
library(tidyverse)
library(data.table)

not_all_na <- function(x) any(!is.na(x)) 

corrD <- df %>% 
 group_by(Year, Month) %>% 
 select(-Year, -Month) %>%  
 select_if(not_all_na) %>%
 cor(use = "pairwise.complete.obs", method = "sp")

I also tried:

   corrD <- df3 %>% 
    group_by(Year, Month) %>% 
    select(-Year, -Month) %>%  
    select_if(not_all_na) %>%
    summarise(correlations = cor(., use = "pairwise.complete.obs", method = "sp"))

But this is also not working.

What I want is:

  1. For each YEAR and Month (i.e. observations muct be grouped by first, Year and then, Month, before computing) the correlations among the rest four variables in the data
  2. If there are NAs in any of the variables, these should not be considered
  3. For each group, i.e. Year-Month (e.g. 2017-1 or 2017-2 and so forth), either a list or separate dataframe should be created so that I should be able to download it and use for further analysis.
  4. End result (correlation table) should not include Year and Month in it (in my case these are still appearing)

My code is not returning the expected outcome. Instead it is coming out something like this:

enter image description here

Can someone help?

Upvotes: 0

Views: 136

Answers (2)

Musebe Ivan
Musebe Ivan

Reputation: 182

Define the variables you want to perform a correlation storing them in the vars_keep. Then emulate group_by using split. Select the variable of interest and perform the correlation:

Note: The split function splits the df dataframe by a groups, returns list of dataframes. Each of these dataframes are subsets of the original dataframes defined by the variable.

vars_keep <- names(df)[c(3, 4, 5,6)]
some <- data %>%
  split(list(data$Year,data$Month)) %>%
  map(select, vars_keep) %>%
  map(cor)

This result to a list of correlations per split in a list some. Output for 1:

some[[1]]

enter image description here

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

You can return a list of correlation matrix for every month in every year.

library(dplyr)
library(purrr)

not_all_na <- function(x) any(!is.na(x)) 

df %>% 
  group_split(Year, Month, .keep = FALSE) %>%
  map(~.x %>% 
        select(where(not_all_na)) %>% 
        cor(use = "pairwise.complete.obs", method = "sp"))

#[[1]]
#                NC 1234(BAR) TPI 2 1300(BAR) TPI 4 1300(BAR) TCC 1200(BAR)
#NC 1234(BAR)             1.0            -0.8            -0.4          -0.2
#TPI 2 1300(BAR)         -0.8             1.0             0.8           0.4
#TPI 4 1300(BAR)         -0.4             0.8             1.0           0.8
#TCC 1200(BAR)           -0.2             0.4             0.8           1.0

#[[2]]
#                NC 1234(BAR) TPI 2 1300(BAR) TPI 4 1300(BAR) TCC 1200(BAR)
#NC 1234(BAR)             1.0             1.0             0.8          -0.4
#TPI 2 1300(BAR)          1.0             1.0             0.8          -0.4
#TPI 4 1300(BAR)          0.8             0.8             1.0          -0.8
#TCC 1200(BAR)           -0.4            -0.4            -0.8           1.0
#...
#...

Upvotes: 1

Related Questions