Reputation: 942
I have an excel document with different sheets (corresponding to different samples). On each sheet/sample. I have values for different categories.
The desired output would be a table with samples as columns and the variables with variable levels as rows, the values would be on the sample columns such as:
Here is an easy reproducible example in R
of two samples/dataframes (excel sheets) where I would like to have variables as rows, and samples as columns:
sample1 <- tibble::tibble(
value = c(0.38,0.22,0.18,0.12,0.1),
Variable = c("A","B","C","D","E"),
Level1 = c("China","India","UK","Italy","Egypt"),
Level2 = c("East","South","West","South","North"),
Level3 = c("Asia","Asia","Europe","Europe","Africa"))
sample2 <- tibble::tibble(
value = c(0.23,0.20,0.15,0.12,0.11),
Variable = c("A","B","F","D","E"),
Level1 = c("China","India","Mexico","Italy","Egypt"),
Level2 = c("East","South","Centre","South","North"),
Level3 = c("Asia","Asia","America","Europe","Africa"))
The desired output would be this:
output <- tibble::tibble(
Variable = c("A","B","C","D","E", "F"),
Level1 = c("China","India","UK","Italy","Egypt", "Mexico"),
Level2 = c("East","South","West","South","North","Centre"),
Level3 = c("Asia","Asia","Europe","Europe","Africa","America"),
sample1 = c(0.38,0.22,0.18,0.12,0.1,0),
sample2 = c(0.23,0.2,0,0.12,0.11,0.15))
output
Variable Level1 Level2 Level3 Sample1 Sample2
A China East Asia 0.38 0.23
B India South Asia 0.22 0.2
C UK West Europe 0.18 0
D Italy South Europe 0.12 0.12
E Egypt North Africa 0.1 0.11
F Mexico Centre America 0 0.15
Upvotes: 0
Views: 51
Reputation: 6489
Using data.table
approach, you could proceed as follows:
library(data.table)
sample1$sample <- "sample1"
sample2$sample <- "sample2"
dcast(rbind(setDT(sample1), sample2), ... ~ sample, value.var = "value", fill = 0)
# Variable Level1 Level2 Level3 sample1 sample2
# 1: A China East Asia 0.38 0.23
# 2: B India South Asia 0.22 0.20
# 3: C UK West Europe 0.18 0.00
# 4: D Italy South Europe 0.12 0.12
# 5: E Egypt North Africa 0.10 0.11
# 6: F Mexico Centre America 0.00 0.15
Upvotes: 1
Reputation: 33498
Using data.table
:
library(data.table)
setDT(sample1)
setDT(sample2)
merge(
sample1, sample2,
by = c("Variable", paste0("Level", 1:3)),
all = TRUE, suffixes = c(".sample1", "sample2"),
)
# Variable Level1 Level2 Level3 value.sample1 valuesample2
# 1: A China East Asia 0.38 0.23
# 2: B India South Asia 0.22 0.20
# 3: C UK West Europe 0.18 NA
# 4: D Italy South Europe 0.12 0.12
# 5: E Egypt North Africa 0.10 0.11
# 6: F Mexico Centre America NA 0.15
Using tidyr
:
full_join(
sample1, sample2,
by = c("Variable", paste0("Level", 1:3)),
suffix = c("_sample1", "_sample2")
)
# value_sample1 Variable Level1 Level2 Level3 value_sample2
# <dbl> <chr> <chr> <chr> <chr> <dbl>
# 1 0.38 A China East Asia 0.23
# 2 0.22 B India South Asia 0.2
# 3 0.18 C UK West Europe NA
# 4 0.12 D Italy South Europe 0.12
# 5 0.1 E Egypt North Africa 0.11
# 6 NA F Mexico Centre America 0.15
Upvotes: 1
Reputation: 138
sample1$name <- rep("S1",nrow(sample1))
sample2$name <- rep("S2",nrow(sample2))
outputs <- rbind(sample1,sample2)
outputs_reshape <-
dcast(Variable+Level1+Level2+Level3~name,data=outputs,fill=0)
outputs_reshape
Upvotes: 1