Reputation: 1163
I have a list of variables and values for several subjects in a data frame (see below). I am using R/R Studio and would like to reformat the data to return every possible combination of variables, as well as their corresponding values for each subject. The actual dataset is much larger than this, so any way of doing it programmatically would be greatly appreciated.
Thankyou
My current data
Variable Subject Value
V1 A 1
V1 B 0
V1 C 1
V2 A 0
V2 B 1
V2 C 0
V3 A 1
V3 B 1
V3 C 1
My desired output:
Subject Variable 1 Variable 2 Value 1 Value 2
A V1 V2 1 0
A V1 V3 1 1
A V2 V3 0 1
B V1 V2 0 1
B V1 V3 0 1
B V2 V3 1 1
C V1 V2 1 0
C V1 V3 1 1
C V2 V3 0 1
Upvotes: 1
Views: 1144
Reputation: 26218
A base R solution, in R version 4.1.0 using native pipes
df <- read.table(text = 'Variable Subject Value
V1 A 1
V1 B 0
V1 C 1
V2 A 0
V2 B 1
V2 C 0
V3 A 1
V3 B 1
V3 C 1', header = T)
t(combn(unique(df$Variable), 2)) |> as.data.frame() |>
setNames(c('Variable1', 'Variable2')) |>
merge(df, by.x = 'Variable1', by.y = 'Variable') |>
merge(df, by.x = c('Variable2', 'Subject'), by.y = c('Variable', 'Subject'), suffixes = c('1', '2'))
#> Variable2 Subject Variable1 Value1 Value2
#> 1 V2 A V1 1 0
#> 2 V2 B V1 0 1
#> 3 V2 C V1 1 0
#> 4 V3 A V2 0 1
#> 5 V3 A V1 1 1
#> 6 V3 B V1 0 1
#> 7 V3 B V2 1 1
#> 8 V3 C V1 1 1
#> 9 V3 C V2 0 1
check it with desired
t(combn(unique(df$Variable), 2)) |> as.data.frame() |>
setNames(c('Variable1', 'Variable2')) |>
merge(df, by.x = 'Variable1', by.y = 'Variable') |>
merge(df, by.x = c('Variable2', 'Subject'), by.y = c('Variable', 'Subject'), suffixes = c('1', '2')) -> df_new
df_new[order(df_new$Subject),c(2,3,1,4,5)]
#> Subject Variable1 Variable2 Value1 Value2
#> 1 A V1 V2 1 0
#> 4 A V2 V3 0 1
#> 5 A V1 V3 1 1
#> 2 B V1 V2 0 1
#> 6 B V1 V3 0 1
#> 7 B V2 V3 1 1
#> 3 C V1 V2 1 0
#> 8 C V1 V3 1 1
#> 9 C V2 V3 0 1
Created on 2021-05-19 by the reprex package (v2.0.0)
Upvotes: 2
Reputation: 388817
You can use combn
to create all possible combinations of Variable
and Value
columns for each Subject
.
library(dplyr)
library(tidyr)
create_data_frame <- function(var, val) {
tab1 <- setNames(data.frame(t(combn(var, 2))), paste0('Variable', 1:2))
tab2 <- setNames(data.frame(t(combn(val, 2))), paste0('Value', 1:2))
cbind(tab1, tab2)
}
df %>%
group_by(Subject) %>%
summarise(data = list(create_data_frame(Variable, Value))) %>%
unnest(col = data)
# Subject Variable1 Variable2 Value1 Value2
# <chr> <chr> <chr> <int> <int>
#1 A V1 V2 1 0
#2 A V1 V3 1 1
#3 A V2 V3 0 1
#4 B V1 V2 0 1
#5 B V1 V3 0 1
#6 B V2 V3 1 1
#7 C V1 V2 1 0
#8 C V1 V3 1 1
#9 C V2 V3 0 1
Upvotes: 3