Reputation: 111
So, I am currently working on some datasets with Google Sheets and looking into automatizing some of my processes, such as calculating the f1 scores of my data. Since I usually have a bunch of worksheets at a time with the same structures, I usually have spreadsheets containing worksheets with the data and separate spreadsheets containing the "template" of my F1-score calculation (or other contents I might wanna add). I'm now wondering what ways there are to automatically merge the two together (as opposed to copy pasting them), all results I've thought of so far into are somewhat hacky and I'd prefer a clean solution that I can apply to any similar issues.
To make it more clear, I've prepared a minimal example. My data might be following the structure of this table:
Data Text | Expected | Result |
---|---|---|
data 1 | abc | abc |
data 2 | def | |
data 3 | def | def |
data 4 |
Then my template might look something like this (I could also add the column headers of the data if that would help in any way, but I usually don't):
Entity | Interesting Value | Interesting Value 2 | Interesting Value 3 | |||
---|---|---|---|---|---|---|
abc | just | a | template | |||
def | some calculations | some contents | things |
And then the result would have both combined like this:
Data Text | Expected | Result | Entity | Interesting Value | Interesting Value 2 | Interesting Value 3 |
---|---|---|---|---|---|---|
data 1 | abc | abc | abc | just | a | template |
data 2 | def | def | some calculations | some contents | things | |
data 3 | def | def | ||||
data 4 |
Again, I'm looking for a universal approach that will still work even if the structure of the above files differs. E.g they might follow the same principle, but maybe the data has more columns, the template differs, etc. However, my templates never overwrite columns that are given in the main data (e.g. if the data has 5 columns, the template will start at column 6 or later).
I'm writing my code in google colaboratory.
Upvotes: 0
Views: 222
Reputation: 7783
I had thought this was a sort of "answered" question, but - as originally discussed with the OP in the comments - was unable to find a decent generic solution.
So I wrote one.
Here is a custom function, written in Google AppScript that will combine [n] tables into one master table, given the "header" text of a column in each table with some common identifying, unique value. If there are conflicts, the FIRST table referenced will "win". So you should put your tables in order with that in mind.
This is the spreadsheet where it was originally built. Feel free to File>Make a Copy to access the script editor and copy the code to your own sheet.
Below is the code:
/**
* Merges ranges using a common column to join data.
*
* Multiple ranges are merged so that the rows that have the same
* value in an identifier column go together in the result array.
*
* @param {"ID"} id_column_name The name of a column that is common to all the ranges.
* @param {Sheet1!A1:D} first_range A range to merge with others.
* @param {Sheet2!A1:D} second_range Repeatable. Any additional ranges to merge with first_range.
* @return {Object[][]} The ranges merged into one array.
* @customfunction
*/
function MergeRanges(id_column_name, first_range, second_range) {
// version 1.0, written by Matt:King, 21 September 2022
// * @license https://www.gnu.org/licenses/gpl-3.0.html
if (arguments.length < 3) {
throw new Error(`Wrong number of arguments to MergeRanges. Expected at least 3 arguments, but got ${arguments.length} arguments.`);
}
let data = [];
for (let i = 1; i < arguments.length; i++) {
data.push(arguments[i]);
}
data = data.map(e => [
e[0],
e.slice(1),
e[0].indexOf(id_column_name)
]);
const errorTableIndex = data.map(e => e[2]).indexOf(-1);
if (errorTableIndex !== -1) {
throw new Error(`MergeRanges could not find the id_column_name '${id_column_name}' in range ${errorTableIndex + 1}.`);
}
data = data.map(e => [
e[0],
e[1].filter(f => f[e[2]]),
e[2]
]);
let commonCol = data.map(e => e[1].map(f => f[e[2]])).flat();
commonCol = [...new Set(commonCol)];
let headers = data.map(e => e[0]).flat();
headers = [...new Set(headers)].sort((a, b) => b === id_column_name ? 0 : 1);
let output = commonCol.map(e => [e].concat(new Array(headers.length - 1)));
data = data.map(e => [
e[1],
e[1].map(f => f[e[2]]).map(g => commonCol.indexOf(g)),
e[0].map(f => headers.indexOf(f))
]);
data.forEach(d => {
for (let i = 0; i < d[1].length; i++) {
for (let j = 0; j < d[2].length; j++) {
if (output[d[1][i]][d[2][j]]) { continue; }
output[d[1][i]][d[2][j]] = d[0][i][j];
}
}
});
return [headers].concat(output);
}
Upvotes: 1