Egor P
Egor P

Reputation: 19

Append unique rows only

I'm having difficulties building a script which would only append rows which are not currently present on the sheet.

I have a master sheet, to which I'm importing csvData. At this moment, I have 1 script for importing csv data and another to clear the sheet from possible duplicates after the import. Although this works, duplicate removal script uses .clearContent and is forced to clear the sheet entirely before returning a list of unique rows. Because the spreadsheet is being used externally (via Appsheet), it creates a risk of unrecorded/corrupted data if any of the users tries to add anything to the sheet while the script is being executed.

Because of this, I am trying to build another script, which would import csvData to an empty array, and then compare it with the array extracted from the master sheet. Via comparison, the script would only append rows which are not yet present on the master sheet.

Unfortunately, I was unable to find examples of duplicate removal without sheet's content clearing. I had another idea for counting repetitions and only selecting the ones which had none, but I wasn't able to make a working script/find an example for that either.

Below is my original script, which includes both importing and duplicate removal using .clearContent:

function importEMS() {
  var fSource=DriveApp.getFolderById('folder id removed'); 
  var fi=fSource.getFilesByName('EMS.csv'); 
  var ss=SpreadsheetApp.openById('sheet id removed');

  //CONVERT CSV FILE TO A TABLE
  if (fi.hasNext()) { 
    var file=fi.next();
    var csv=file.getBlob().getDataAsString();
    var csvData=CSVToArray(csv);
    var timestamp = new Date();
    var tsh=ss.getSheetByName('Main');
    for (var i=1;i<csvData.length-1;i++) {
      csvData[i][8] = timestamp;
    }

   //APPEND NEW ROWS

    for (var i=1;i<csvData.length;i++) {
      tsh.appendRow(csvData[i]);
    }
  }

  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheetByName("Main");
  var data = sheet.getDataRange().getValues();
  var newData = new Array();
  for(i in data){
    var row = data[i];
    var duplicate = false;
    for(j in newData){

      if(row.slice(0,7).join().toLowerCase() == newData[j].slice(0,7).join().toLowerCase()){
        duplicate = true;
      }
    }
    if(!duplicate){
      newData.push(row);
    }
  }
  // Clear the existing info and update with newData.
  sheet.clearContents();
  sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}

Does anyone have an idea of how to append new rows only instead of the entire csvData to the master sheet?

Thanks for the help in advance!

UPD: Added a link to a data sample. The real main sheet contains 20 columns, but I removed them from the sample, the column order is identical. CSV import sheet is the type of data that I import. On that sheet, I highlighted with green which rows I'm trying to append to the main sheet.

UPD2: Provided solution by Oleg Valter functions, but only if I replace .getValues() by .getDisplayValues(), which converts all present types of data into strings, which is also the format in which .csv data is imported.

Upvotes: 1

Views: 1178

Answers (1)

0Valt
0Valt

Reputation: 10345

Solution

  1. Get CSV values as an Array of Arrays (let's assume that's what CSVToArray() does).
  2. Get current values as an Array of Arrays via getDataRange().getValues().
  3. Filter out duplicate values like this (assuming that every pair of rows has the same length - otherwise the algorithm becomes more complex, but judging from your code, you check first 8 values only):

//source = [[...], ... , [...]];
//target = [[...], ... , [...]];

/**
 * Checks if at least one element
 * is positioned differently
 * @param {*[]} a
 * @param {*[][]} b
 * @returns {Boolean}
 */
var unique = function (a,b) {
  return b.every(function(bRow){
    return a.some(function(A,aIdx){
      return A !== bRow[aIdx];
    });
  });
};

/**
 * Leaves only values not present in source
 * @param {*[][]} source
 * @param {*[][]} target
 * @returns {*[][]}
 */
var filterUnique = function (source,target) {
  return target.filter(function (row) {
    return unique(row,source);
  });
};

var S1 = [[1,2,3],['A','D','C'],[5],[7]];
var T1 = [[1,2,3],['A','B','C'],[6]];

var S2 = [[1.00,'Vehicle 1',1.00,'Ready','12-5-2020',	1, 'Event A',	'Location A']];
var T2 = [S2[0],[2.00,'V2',1.00,'Ready','12-5-2020', 1, 'Event A', 'Location A']];

var check1 = filterUnique(S1,T1);
var check2 = filterUnique(S2,T2);

console.log(check1,check2);

Notes

  1. You would need to swap the check with timestamping and only timestamp after leaving only unique rows, otherwise there is a risk of appending the row because its timestamp is the only diff.

References

  1. every() method ref on MDN
  2. some() method ref on MDN
  3. filter() method ref on MDN

Upvotes: 2

Related Questions