R Comparing specific columns based on names

Question

Related to my post : How to compare multiple specific columns in R

If I were to compute cosine similarity between Column A and all the names not starting with A, and similarly Column B and all the names not starting with B, how would I go about it? Thanks for your help.

df <- tibble(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10),
Adam = rnorm(10),
Aaron = rnorm(10),
Abby = rnorm(10),
Brett= rnorm(10),
Bobby= rnorm(10),
Blaine= rnorm(10),
Cate= rnorm(10),
Camila= rnorm(10),
Calvin= rnorm(10),
Dana= rnorm(10),
Debbie= rnorm(10),
Derek= rnorm(10))

akrun · Accepted Answer

We can loop over the first 4 column names, then select the columns in the dataset where the first character is not matching (!=), do the cosine.similarity

library(tcR)
library(dplyr)
library(purrr)
map(names(df)[1:4], ~  {
            nm1 <- .x;
            df %>% 
                select(nm1, names(.)[-(1:4)][substr(names(.)[-(1:4)], 
                       1, 1) != toupper(nm1)]) %>%
                summarise_at(-1, ~ cosine.similarity(!! rlang::sym(nm1), .))})
#[[1]]
#     Brett     Bobby    Blaine        Cate   Camila     Calvin      Dana     Debbie        Derek
#1 1.359387 0.2699819 -0.196264 -0.03090496 1.291874 -0.1722176 0.4103589 0.02344549 -0.000173328

#[[2]]
#          Adam        Aaron        Abby         Cate       Camila     Calvin        Dana       Debbie
#1 -0.009184887 -0.001045286 0.005465617 0.0006748685 -0.002450131 0.00635276 -0.01170922 -0.002804459
#       Derek
#1 8.3403e-07

#[[3]]
#         Adam      Aaron       Abby     Brett     Bobby    Blaine      Dana     Debbie         Derek
#1 -0.03969609 0.05441983 -0.5146579 0.0233075 0.1194043 0.1218981 0.2447404 0.02858123 -0.0001220901

#[[4]]
#        Adam     Aaron      Abby      Brett    Bobby     Blaine        Cate     Camila     Calvin
#1 -0.1139157 0.2842454 -0.122818 -0.2140623 0.274513 0.06029557 0.004626398 -0.1162282 0.06058211

It may be better to create a single dataset

library(tidyr)
map_dfr(set_names(names(df)[1:4], names(df)[1:4]), ~  {
            nm1 <- .x;
             df %>% 
                 select(nm1, names(.)[-(1:4)][substr(names(.)[-(1:4)], 
                        1, 1) != toupper(nm1)]) %>%
                 summarise_at(-1, ~ cosine.similarity(!! rlang::sym(nm1), .)) %>%
                 pivot_longer(everything())}, .id = 'group')

R Comparing specific columns based on names

Answers (1)

Related Questions