Reputation: 63
I have 5 excel sheets, sharing same number of rows and columns. I would like to make one sheet by combining rows from these 5 sheets in a way that each row from these sheets come below to each other. How is this possible in Excel / R / or any other programming language. I have shared my sample input and desired output..
Input data:
SHEET 1:
Year | A | B |
---|---|---|
2000 | 0.2 | 0.3 |
2001 | 0.5 | 0.4 |
2002 | 0.1 | 0.2 |
2003 | 0.2 | 0.3 |
2004 | 0.5 | 0.1 |
2005 | 0.2 | 0.2 |
SHEET 2:
Year | st1 | st2 |
---|---|---|
2000 | blue | blue |
2001 | red | red |
2002 | yellow | yellow |
2003 | green | green |
2004 | white | white |
2005 | black | black |
Desired output
Year | x1 | x2 |
---|---|---|
2000 | 0.2 | 0.3 |
2000 | blue | blue |
2001 | 0.5 | 0.4 |
2001 | red | red |
2002 | 0.1 | 0.2 |
2002 | yellow | yellow |
2003 | 0.2 | 0.3 |
2003 | green | green |
2004 | 0.5 | 0.1 |
2004 | white | white |
2005 | 0.2 | 0.2 |
2005 | black | black |
I would appreciate any help in this regard. Thank you in advance.
Upvotes: 1
Views: 673
Reputation: 73802
Here is an R function that merges sheets with common time column, either read in using openxlsx::read.xlsx
or read.csv
. Depending on the setting you can produce different output formats.
Additionally, if it happens you have identical names in the separate sheets, so merge
ing gets tricky, we can provide new names to rename them.
> rd_merge <- \(r, new_names, tcol=1, long=FALSE) {
+ yr <- el(lapply(r, names))[tcol]
+ ncv <- el(lapply(r, ncol)) - 1L
+ if (!missing(new_names)) {
+ r <- Map(setNames, r,
+ Map('c', yr, lapply(new_names, paste,
+ if (ncv > 1) seq_len(ncv) else '', sep=''))
+ )
+ }
+ if (long) {
+ r <- Map(`[<-`, r, '.id', value=seq_along(r))
+ } else {
+ stopifnot(Reduce(identical, sapply(r, names)[1, ]))
+ }
+ Reduce(\(...) merge(..., all=TRUE), x=r)
+ }
Excel data as shown in OP
> rd_ex <- Map(openxlsx::read.xlsx, 'foo.xlsx', sheet=1:2) ## recommended
> rd_merge(rd_ex)
Year A B st1 st2
1 2000 0.2 0.3 blue blue
2 2001 0.5 0.4 red red
3 2002 0.1 0.2 yellow yellow
4 2003 0.2 0.3 green green
5 2004 0.5 0.1 white white
6 2005 0.2 0.2 black black
> rd_merge(rd_ex, new_names=c('foo', 'bar')) ## same, but using new names
Year foo1 foo2 bar1 bar2
1 2000 0.2 0.3 blue blue
2 2001 0.5 0.4 red red
3 2002 0.1 0.2 yellow yellow
4 2003 0.2 0.3 green green
5 2004 0.5 0.1 white white
6 2005 0.2 0.2 black black
> rd_merge(rd_ex, long=TRUE, new_names='x') ## desired (?) long format
Year x1 x2 .id
1 2000 0.2 0.3 1
2 2000 blue blue 2
3 2001 0.5 0.4 1
4 2001 red red 2
5 2002 0.1 0.2 1
6 2002 yellow yellow 2
7 2003 0.2 0.3 1
8 2003 green green 2
9 2004 0.5 0.1 1
10 2004 white white 2
11 2005 0.2 0.2 1
12 2005 black black 2
Here an example with sheets that have identical names,
> rd_ex <- Map(openxlsx::read.xlsx, 'foo2.xlsx', sheet=1:2)
> lapply(rd_ex, names)
$`foo2.xlsx`
[1] "Year" "st1"
$<NA>
[1] "Year" "st1"
that we can't merge that easily. But with the function we can provide new names.
> rd_merge(rd_ex, new_names=c('foo', 'bar'))
Year foo bar
1 2000 0.2 blue
2 2001 0.5 red
3 2002 0.1 yellow
4 2003 0.2 green
5 2004 0.5 white
6 2005 0.2 black
We can do the same with your read in .csv files.
> rd_csv <- lapply(c('foo21.csv', 'foo21.csv'), read.csv)
> lapply(rd_csv, names)
[[1]]
[1] "Year" "st1"
[[2]]
[1] "Year" "st1"
> rd_merge(rd_csv, new_names=c('temperature', 'ppt'))
Year temperature ppt
1 2000 0.2 blue
2 2001 0.5 red
3 2002 0.1 yellow
4 2003 0.2 green
5 2004 0.5 white
6 2005 0.2 black
Upvotes: 0
Reputation: 20512
Here is an R response. Essentially:
lapply()
over sheet names, reading each sheet into a list.col_types = "text"
for the columns we are going to bind, so there is no issue with combining character and numeric vectors.library(dplyr)
lapply(
readxl::excel_sheets(file_path),
\(sheet)
readxl::read_excel(
file_path,
sheet = sheet,
skip = 1,
col_names = c("year", "x1", "x2"),
col_types = c("numeric", "text", "text")
) |>
mutate(rn = row_number())
) |>
bind_rows() |>
arrange(rn) |>
select(-rn)
# # A tibble: 12 × 3
# year x1 x2
# <dbl> <chr> <chr>
# 1 2000 0.2 0.3
# 2 2000 blue blue
# 3 2001 0.5 0.4
# 4 2001 red red
# 5 2002 0.1 0.2
# 6 2002 yellow yellow
# 7 2003 0.2 0.3
# 8 2003 green green
# 9 2004 0.5 0.1
# 10 2004 white white
# 11 2005 0.2 0.2
# 12 2005 black black
This is kind of awkward and it feels wrong to combine character and numeric vectors. It would be more natural to join them:
lapply(
readxl::excel_sheets(file_path),
\(sheet)
readxl::read_excel(file_path, sheet = sheet)
) |>
Reduce(
\(x, y) full_join(x, y, by = "Year"),
x = _
)
# Year A B st1 st2
# <dbl> <dbl> <dbl> <chr> <chr>
# 1 2000 0.2 0.3 blue blue
# 2 2001 0.5 0.4 red red
# 3 2002 0.1 0.2 yellow yellow
# 4 2003 0.2 0.3 green green
# 5 2004 0.5 0.1 white white
# 6 2005 0.2 0.2 black black
This is better - our numbers are numeric
and our text is character
. Using Reduce()
over the list means this approach will generalise to any number of sheets.
Upvotes: 2
Reputation: 27458
Since in the OP its written using any programming language, whether Excel/R/Other
hence posting it an answer using Excel Formulas
=VSTACK({"Year","x1","x2"},SORT(VSTACK(Table_One,Table_Two)))
NOTE:
MS365
in current channel.Structured References
aka Tables
and then use the above formula.Power Query
as well.Upvotes: 1