tom
tom

Reputation: 63

How to combine rows from different sheets into one sheet?

I have 5 excel sheets, sharing same number of rows and columns. I would like to make one sheet by combining rows from these 5 sheets in a way that each row from these sheets come below to each other. How is this possible in Excel / R / or any other programming language. I have shared my sample input and desired output..

Input data:

SHEET 1:

Year A B
2000 0.2 0.3
2001 0.5 0.4
2002 0.1 0.2
2003 0.2 0.3
2004 0.5 0.1
2005 0.2 0.2

SHEET 2:

Year st1 st2
2000 blue blue
2001 red red
2002 yellow yellow
2003 green green
2004 white white
2005 black black

Desired output

Year x1 x2
2000 0.2 0.3
2000 blue blue
2001 0.5 0.4
2001 red red
2002 0.1 0.2
2002 yellow yellow
2003 0.2 0.3
2003 green green
2004 0.5 0.1
2004 white white
2005 0.2 0.2
2005 black black

I would appreciate any help in this regard. Thank you in advance.

Upvotes: 1

Views: 673

Answers (3)

jay.sf
jay.sf

Reputation: 73802

Here is an R function that merges sheets with common time column, either read in using openxlsx::read.xlsx or read.csv. Depending on the setting you can produce different output formats.

Additionally, if it happens you have identical names in the separate sheets, so mergeing gets tricky, we can provide new names to rename them.

> rd_merge <- \(r, new_names, tcol=1, long=FALSE) {
+   yr <- el(lapply(r, names))[tcol]
+   ncv <- el(lapply(r, ncol)) - 1L
+   if (!missing(new_names)) {
+     r <- Map(setNames, r, 
+              Map('c', yr, lapply(new_names, paste, 
+                                  if (ncv > 1) seq_len(ncv) else '', sep=''))
+     )
+   }
+   if (long) {
+     r <- Map(`[<-`, r, '.id', value=seq_along(r))
+   } else {
+     stopifnot(Reduce(identical, sapply(r, names)[1, ]))
+   }
+   Reduce(\(...) merge(..., all=TRUE), x=r)
+ }

Usage

Excel data as shown in OP

> rd_ex <- Map(openxlsx::read.xlsx, 'foo.xlsx', sheet=1:2)  ## recommended
> rd_merge(rd_ex)
  Year   A   B    st1    st2
1 2000 0.2 0.3   blue   blue
2 2001 0.5 0.4    red    red
3 2002 0.1 0.2 yellow yellow
4 2003 0.2 0.3  green  green
5 2004 0.5 0.1  white  white
6 2005 0.2 0.2  black  black
> rd_merge(rd_ex, new_names=c('foo', 'bar'))  ## same, but using new names
  Year foo1 foo2   bar1   bar2
1 2000  0.2  0.3   blue   blue
2 2001  0.5  0.4    red    red
3 2002  0.1  0.2 yellow yellow
4 2003  0.2  0.3  green  green
5 2004  0.5  0.1  white  white
6 2005  0.2  0.2  black  black
> rd_merge(rd_ex, long=TRUE, new_names='x')  ## desired (?) long format
   Year     x1     x2 .id
1  2000    0.2    0.3   1
2  2000   blue   blue   2
3  2001    0.5    0.4   1
4  2001    red    red   2
5  2002    0.1    0.2   1
6  2002 yellow yellow   2
7  2003    0.2    0.3   1
8  2003  green  green   2
9  2004    0.5    0.1   1
10 2004  white  white   2
11 2005    0.2    0.2   1
12 2005  black  black   2

Here an example with sheets that have identical names,

> rd_ex <- Map(openxlsx::read.xlsx, 'foo2.xlsx', sheet=1:2)
> lapply(rd_ex, names)
$`foo2.xlsx`
[1] "Year" "st1" 

$<NA>
[1] "Year" "st1" 

that we can't merge that easily. But with the function we can provide new names.

> rd_merge(rd_ex, new_names=c('foo', 'bar'))
  Year foo    bar
1 2000 0.2   blue
2 2001 0.5    red
3 2002 0.1 yellow
4 2003 0.2  green
5 2004 0.5  white
6 2005 0.2  black

We can do the same with your read in .csv files.

> rd_csv <- lapply(c('foo21.csv', 'foo21.csv'), read.csv)
> lapply(rd_csv, names)
[[1]]
[1] "Year" "st1" 

[[2]]
[1] "Year" "st1" 
> rd_merge(rd_csv, new_names=c('temperature', 'ppt'))
  Year temperature    ppt
1 2000         0.2   blue
2 2001         0.5    red
3 2002         0.1 yellow
4 2003         0.2  green
5 2004         0.5  white
6 2005         0.2  black

Upvotes: 0

SamR
SamR

Reputation: 20512

Here is an R response. Essentially:

  1. lapply() over sheet names, reading each sheet into a list.
  2. Skip the header row and assign the same column names so we can bind them into one data frame by name.
  3. Set col_types = "text" for the columns we are going to bind, so there is no issue with combining character and numeric vectors.
  4. Add row number to each data frame.
  5. Bind into one data frame.
  6. Sort this data frame by row number to achieve desired output.
  7. Remove row number.
library(dplyr)
lapply(
    readxl::excel_sheets(file_path),
    \(sheet)
    readxl::read_excel(
        file_path,
        sheet = sheet,
        skip = 1,
        col_names = c("year", "x1", "x2"),
        col_types = c("numeric", "text", "text")
    ) |>
        mutate(rn = row_number())
) |>
    bind_rows() |>
    arrange(rn) |>
    select(-rn)

# # A tibble: 12 × 3
#     year x1     x2
#    <dbl> <chr>  <chr>
#  1  2000 0.2    0.3
#  2  2000 blue   blue
#  3  2001 0.5    0.4
#  4  2001 red    red
#  5  2002 0.1    0.2
#  6  2002 yellow yellow
#  7  2003 0.2    0.3
#  8  2003 green  green
#  9  2004 0.5    0.1
# 10  2004 white  white
# 11  2005 0.2    0.2
# 12  2005 black  black 

Frame challenge: join rather than combine columns

This is kind of awkward and it feels wrong to combine character and numeric vectors. It would be more natural to join them:

lapply(
    readxl::excel_sheets(file_path),
    \(sheet)
    readxl::read_excel(file_path, sheet = sheet)
) |>
    Reduce(
        \(x, y) full_join(x, y, by = "Year"),
        x = _
    )

#    Year     A     B st1    st2   
#   <dbl> <dbl> <dbl> <chr>  <chr> 
# 1  2000   0.2   0.3 blue   blue  
# 2  2001   0.5   0.4 red    red   
# 3  2002   0.1   0.2 yellow yellow
# 4  2003   0.2   0.3 green  green 
# 5  2004   0.5   0.1 white  white 
# 6  2005   0.2   0.2 black  black 

This is better - our numbers are numeric and our text is character. Using Reduce() over the list means this approach will generalise to any number of sheets.

Upvotes: 2

Mayukh Bhattacharya
Mayukh Bhattacharya

Reputation: 27458

Since in the OP its written using any programming language, whether Excel/R/Other hence posting it an answer using Excel Formulas

enter image description here


=VSTACK({"Year","x1","x2"},SORT(VSTACK(Table_One,Table_Two)))

NOTE:

  • The above formula works with the MS365 in current channel.
  • Highly suggested to convert the range of data in sheets into Structured References aka Tables and then use the above formula.
  • This can also be accomplished using Power Query as well.

Upvotes: 1

Related Questions