Reputation: 805
I have a tidy dataset that has several numerical and categorical columns, for instance id, categorical_1, categorical_2, etc. Also, I got columns that follows a patter according to:
n_neighbors <- c(15,30,50,100,150)
min_distance <- c( 0.001, 0.003, 0.009,0.03,0.09)
metrics <- c("euclidean" ,"cosine","hamming")
In the example above, I just show 4 columns of the data frame. The name of the following columns refer to:
I would like to pivot the table in a way that there is new columns created: dimension_1, dimension_2, n_neighbors, min_distance, metric. If possible using deplyr and tidy dataframes
dimension_1 and dimension_2 should contain the values of the rows In the above example, while n_neighbors, min_distance, metric should contain the value written in the column name. The output that I aim for is:
row_1 -> dimension_1=0.5677311, dimension_2=-11.70898, n_neighbors=15, min_distance=0.001, metric = euclidean
row_2 -> dimension_1=0.5682809, dimension_2=-11.71056, n_neighbors=15, min_distance=0.001, metric = euclidean
row_3 -> dimension_1=0.5674967, dimension_2=-11.70665, n_neighbors=15, min_distance=0.001, metric = euclidean
row_4 -> dimension_1=0.5687164, dimension_2=-11.70848, n_neighbors=15, min_distance=0.001, metric = euclidean
row_5 -> dimension_1=-1.674230, dimension_2=16.693973, n_neighbors=15, min_distance=0.001, metric = cosine
..
..
row_8 -> dimension_1=-1.674726, dimension_2=16.69405, n_neighbors=15, min_distance=0.001, metric = cosine
Here you can find the code example for the subset of data:
structure(list(nn_.15.md_.0.001.metric.euclidean.1 = c(0.567731082439423,
0.568280875682831, 0.567496657371521, 0.568716406822205), nn_.15.md_.0.001.metric.euclidean.2 = c(-11.7089824676514,
-11.7105579376221, -11.7066516876221, -11.7084884643555), nn_.15.md_.0.001.metric.cosine.1 = c(-1.67423057556152,
-1.67501986026764, -1.69689452648163, -1.67472624778748), nn_.15.md_.0.001.metric.cosine.2 = c(16.6939735412598,
16.6941356658936, 16.7890815734863, 16.6940479278564))
Upvotes: 0
Views: 102
Reputation: 21908
You can use the following solution:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(everything(), names_to = c(NA, "n_neighbour", NA, "min_distance", NA, ".value"),
names_pattern = "(nn_\\.)(\\d+)\\.(md_\\.)(\\d\\.\\d+)\\.(metric\\.)(.*)") %>%
pivot_longer(!c(min_distance, n_neighbour),
names_to = c("metric", ".value"), names_pattern = "(.*)\\.(\\d)") %>%
rename_with(~ paste("Dimension", .), c(4, 5)) %>%
relocate(`Dimension 1`, `Dimension 2`) %>%
arrange(desc(metric))
# A tibble: 8 x 5
`Dimension 1` `Dimension 2` min_distance n_neighbour metric
<dbl> <dbl> <chr> <chr> <chr>
1 0.568 -11.7 0.001 15 euclidean
2 0.568 -11.7 0.001 15 euclidean
3 0.567 -11.7 0.001 15 euclidean
4 0.569 -11.7 0.001 15 euclidean
5 -1.67 16.7 0.001 15 cosine
6 -1.68 16.7 0.001 15 cosine
7 -1.70 16.8 0.001 15 cosine
8 -1.67 16.7 0.001 15 cosine
Upvotes: 4
Reputation: 388817
You can use pivot_longer
to get the data in long format.
library(tidyverse)
df %>%
pivot_longer(cols = everything(),
names_to = c('n_neighbors', 'min_distance', 'metric', 'row'),
names_pattern = 'nn_\\.(\\d+)\\.md_\\.(\\d+\\.\\d+)\\.metric\\.([a-z]+)\\.(\\d+)')
# n_neighbors min_distance metric row value
# <chr> <chr> <chr> <chr> <dbl>
# 1 15 0.001 euclidean 1 0.568
# 2 15 0.001 euclidean 2 -11.7
# 3 15 0.001 cosine 1 -1.67
# 4 15 0.001 cosine 2 16.7
# 5 15 0.001 euclidean 1 0.568
# 6 15 0.001 euclidean 2 -11.7
# 7 15 0.001 cosine 1 -1.68
# 8 15 0.001 cosine 2 16.7
# 9 15 0.001 euclidean 1 0.567
#10 15 0.001 euclidean 2 -11.7
#11 15 0.001 cosine 1 -1.70
#12 15 0.001 cosine 2 16.8
#13 15 0.001 euclidean 1 0.569
#14 15 0.001 euclidean 2 -11.7
#15 15 0.001 cosine 1 -1.67
#16 15 0.001 cosine 2 16.7
If you want data exactly as shown you can add a pivot_wider
step.
df %>%
pivot_longer(cols = everything(),
names_to = c('n_neighbors', 'min_distance', 'metric', 'row'),
names_pattern = 'nn_\\.(\\d+)\\.md_\\.(\\d+\\.\\d+)\\.metric\\.([a-z]+)\\.(\\d+)') %>%
pivot_wider(names_from = row, values_from = value, names_prefix = 'dimension_', values_fn = list) %>%
unnest(starts_with('dimension'))
# n_neighbors min_distance metric dimension_1 dimension_2
# <chr> <chr> <chr> <dbl> <dbl>
#1 15 0.001 euclidean 0.568 -11.7
#2 15 0.001 euclidean 0.568 -11.7
#3 15 0.001 euclidean 0.567 -11.7
#4 15 0.001 euclidean 0.569 -11.7
#5 15 0.001 cosine -1.67 16.7
#6 15 0.001 cosine -1.68 16.7
#7 15 0.001 cosine -1.70 16.8
#8 15 0.001 cosine -1.67 16.7
Upvotes: 1