Ecg
Ecg

Reputation: 942

Merge excel sheets/ Rdataframes into a common data frame keeping samples as columns

I have an excel document with different sheets (corresponding to different samples). On each sheet/sample. I have values for different categories.

The desired output would be a table with samples as columns and the variables with variable levels as rows, the values would be on the sample columns such as:

Here is an easy reproducible example in R of two samples/dataframes (excel sheets) where I would like to have variables as rows, and samples as columns:

sample1 <- tibble::tibble(
  value = c(0.38,0.22,0.18,0.12,0.1), 
  Variable = c("A","B","C","D","E"),
  Level1 = c("China","India","UK","Italy","Egypt"),
  Level2 = c("East","South","West","South","North"),
  Level3 = c("Asia","Asia","Europe","Europe","Africa"))

sample2 <- tibble::tibble(
  value = c(0.23,0.20,0.15,0.12,0.11), 
  Variable = c("A","B","F","D","E"),
  Level1 = c("China","India","Mexico","Italy","Egypt"),
  Level2 = c("East","South","Centre","South","North"),
  Level3 = c("Asia","Asia","America","Europe","Africa"))

The desired output would be this:

output <- tibble::tibble(
  Variable = c("A","B","C","D","E", "F"),
  Level1 = c("China","India","UK","Italy","Egypt", "Mexico"),
  Level2 = c("East","South","West","South","North","Centre"),
  Level3 = c("Asia","Asia","Europe","Europe","Africa","America"),
  sample1 = c(0.38,0.22,0.18,0.12,0.1,0),
  sample2 = c(0.23,0.2,0,0.12,0.11,0.15))


output
    Variable    Level1  Level2  Level3  Sample1 Sample2
    A   China   East    Asia    0.38    0.23
    B   India   South   Asia    0.22    0.2
    C   UK  West    Europe  0.18    0
    D   Italy   South   Europe  0.12    0.12
    E   Egypt   North   Africa  0.1 0.11
    F   Mexico  Centre  America 0   0.15

Upvotes: 0

Views: 51

Answers (3)

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6489

Using data.table approach, you could proceed as follows:

library(data.table)

sample1$sample <- "sample1"
sample2$sample <- "sample2"

dcast(rbind(setDT(sample1), sample2), ... ~ sample, value.var = "value", fill = 0)

#    Variable Level1 Level2  Level3 sample1 sample2
# 1:        A  China   East    Asia    0.38    0.23
# 2:        B  India  South    Asia    0.22    0.20
# 3:        C     UK   West  Europe    0.18    0.00
# 4:        D  Italy  South  Europe    0.12    0.12
# 5:        E  Egypt  North  Africa    0.10    0.11
# 6:        F Mexico Centre America    0.00    0.15

Upvotes: 1

s_baldur
s_baldur

Reputation: 33498

Using data.table:

library(data.table)
setDT(sample1)
setDT(sample2)
merge(
  sample1, sample2, 
  by = c("Variable", paste0("Level", 1:3)), 
  all = TRUE, suffixes = c(".sample1", "sample2"),
)

#    Variable Level1 Level2  Level3 value.sample1 valuesample2
# 1:        A  China   East    Asia          0.38         0.23
# 2:        B  India  South    Asia          0.22         0.20
# 3:        C     UK   West  Europe          0.18           NA
# 4:        D  Italy  South  Europe          0.12         0.12
# 5:        E  Egypt  North  Africa          0.10         0.11
# 6:        F Mexico Centre America            NA         0.15

Using tidyr:

full_join(
  sample1, sample2, 
  by = c("Variable", paste0("Level", 1:3)), 
  suffix = c("_sample1", "_sample2")
)

#   value_sample1 Variable Level1 Level2 Level3  value_sample2
#           <dbl> <chr>    <chr>  <chr>  <chr>           <dbl>
# 1          0.38 A        China  East   Asia             0.23
# 2          0.22 B        India  South  Asia             0.2 
# 3          0.18 C        UK     West   Europe          NA   
# 4          0.12 D        Italy  South  Europe           0.12
# 5          0.1  E        Egypt  North  Africa           0.11
# 6         NA    F        Mexico Centre America          0.15

Upvotes: 1

Prahlad
Prahlad

Reputation: 138

sample1$name <- rep("S1",nrow(sample1))
sample2$name <- rep("S2",nrow(sample2))
outputs <- rbind(sample1,sample2)
outputs_reshape <- 
dcast(Variable+Level1+Level2+Level3~name,data=outputs,fill=0)
outputs_reshape

Upvotes: 1

Related Questions