Reputation: 71
Related to my post : How to compare multiple specific columns in R
If I were to compute cosine similarity between Column A and all the names not starting with A, and similarly Column B and all the names not starting with B, how would I go about it? Thanks for your help.
df <- tibble(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10),
Adam = rnorm(10),
Aaron = rnorm(10),
Abby = rnorm(10),
Brett= rnorm(10),
Bobby= rnorm(10),
Blaine= rnorm(10),
Cate= rnorm(10),
Camila= rnorm(10),
Calvin= rnorm(10),
Dana= rnorm(10),
Debbie= rnorm(10),
Derek= rnorm(10))
Upvotes: 0
Views: 67
Reputation: 887991
We can loop over the first 4 column names, then select
the columns in the dataset where the first character is not matching (!=
), do the cosine.similarity
library(tcR)
library(dplyr)
library(purrr)
map(names(df)[1:4], ~ {
nm1 <- .x;
df %>%
select(nm1, names(.)[-(1:4)][substr(names(.)[-(1:4)],
1, 1) != toupper(nm1)]) %>%
summarise_at(-1, ~ cosine.similarity(!! rlang::sym(nm1), .))})
#[[1]]
# Brett Bobby Blaine Cate Camila Calvin Dana Debbie Derek
#1 1.359387 0.2699819 -0.196264 -0.03090496 1.291874 -0.1722176 0.4103589 0.02344549 -0.000173328
#[[2]]
# Adam Aaron Abby Cate Camila Calvin Dana Debbie
#1 -0.009184887 -0.001045286 0.005465617 0.0006748685 -0.002450131 0.00635276 -0.01170922 -0.002804459
# Derek
#1 8.3403e-07
#[[3]]
# Adam Aaron Abby Brett Bobby Blaine Dana Debbie Derek
#1 -0.03969609 0.05441983 -0.5146579 0.0233075 0.1194043 0.1218981 0.2447404 0.02858123 -0.0001220901
#[[4]]
# Adam Aaron Abby Brett Bobby Blaine Cate Camila Calvin
#1 -0.1139157 0.2842454 -0.122818 -0.2140623 0.274513 0.06029557 0.004626398 -0.1162282 0.06058211
It may be better to create a single dataset
library(tidyr)
map_dfr(set_names(names(df)[1:4], names(df)[1:4]), ~ {
nm1 <- .x;
df %>%
select(nm1, names(.)[-(1:4)][substr(names(.)[-(1:4)],
1, 1) != toupper(nm1)]) %>%
summarise_at(-1, ~ cosine.similarity(!! rlang::sym(nm1), .)) %>%
pivot_longer(everything())}, .id = 'group')
Upvotes: 1