DMarx
DMarx

Reputation: 377

Google Script to remove duplicate rows based on 2 columns criteria

I am using a script in the that pulls events details from a calendar and adds them into column A and B in a spreadsheet, remove any duplicate events and then sort based on date. My hope is then that I can have staff add additional data about these events in columns C,D, etc.

This seems to be working fine, but once information is added in columns C,D the scripts de-duplication function stops working because it is comparing the full row and not simply the content being imported.

Is there a method to adjust the below de-duplication script to only check columns A and B when deciding if rows are duplicate and should be removed?

I have attempted to use the code (currently commented out) adjustment below as found in the variation section in this article: https://developers.google.com/apps-script/articles/removing_duplicates - but it still does not seem to work.

Thank you for any assistance

The Script:

enter code here

//this section retrieves the information from a calendar from a user submitted date until the end of the year

function importEvents(){
  var calID = Browser.inputBox("Please enter your google Cal ID", Browser.Buttons.OK_CANCEL);
  var startdate = Browser.inputBox("Start date using 1/1/2013 format", Browser.Buttons.OK_CANCEL);
  var cal = CalendarApp.getCalendarById(calID);
  var events_sheet = SpreadsheetApp.getActiveSheet();
  var events = cal.getEvents(new Date(startdate), new Date("1/1/2014"));
  var lr = events_sheet.getLastRow();
  var eventarray = new Array();

  var i = 0; // edited
    for (i = 0; i < events.length; i++) {
      line = new Array();
      line.push(events[i].getStartTime());
      line.push(events[i].getTitle());

     //Potential more data that I am not getting at this time
     // line.push(events[i].getDescription());
     // line.push(events[i].getEndTime());

      eventarray.push(line);
    }

    events_sheet.getRange("A"+(lr+1)+":B"+(lr+i)).setValues(eventarray);

  //sort ascending dy date

  var range = SpreadsheetApp.getActiveSheet().getRange("A3:F2000"); 
  range.sort([{column: 1, ascending: true}]); 

  //removes duplicate rows

  var sheet = SpreadsheetApp.getActiveSheet();
  var data = sheet.getDataRange().getValues();
  var newData = new Array();
  for(i in data){
    var row = data[i];
    var duplicate = false;
    for(j in newData){

      if(row.join() == newData[j].join()){
  duplicate = true;

   //This was supposed to only check the first 2 columns, but it doesn't work
   //I found this code in the variation section of this tutorial: https://developers.google.com/apps-script/articles/removing_duplicates   
   //     
   //  if(row[0] == newData[j][0] && row[1] == newData[j][1]){
   // duplicate = true;

      }
    }
    if(!duplicate){
      newData.push(row);
    }
  }
  sheet.clearContents();
  sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);

}

Upvotes: 2

Views: 3892

Answers (2)

Serge insas
Serge insas

Reputation: 46794

I would suggest that you stay at array level all along, including the sort process.

You can easily choose on what cell you sort by using a sort function like below :

// This example sorts on first column of data array
  data.sort(function(x,y){
  var xp = x[0];
  var yp = y[0];
  //Logger.log(xp+'   '+yp);// just to check that it takes the right column
  return xp == yp ? 0 : xp < yp ? -1 : 1;// first sort  ascending
});

As for the duplicate removal there are a few ways to do it, one possible is like that :

var newData = new Array();
  for(var i in data){
    var duplicate = false;
    for(var j in newData){

      if(data[i][0].toString()+data[i].toString() == newData[j][0].toString()+newData[j][1].toString()){ duplicate = true }
    }
    if(!duplicate){ newData.push(data[i]) }
  }

Upvotes: 0

Mogsdad
Mogsdad

Reputation: 45710

That commented out block can be made to work, with some tweaking. A problem arises because of the nature of your data, and the fact that the tutorial didn't take into account the way that object comparisons work. (Refer to JavaScript Date Object Comparison.)

Your first column consists of Date objects, which the == comparator will evaluate true only when both sides of the comparison are the same object. In the whole-row comparison, the dates are being coerced to strings by the .join() operation. We can get the same effect on a cell-by-cell basis, like this:

 if(row[0].toString() == newData[j][0].toString() && row[1] == newData[j][1]){
   duplicate = true;
 }

Alternatively, we can restrict the comparison to the first two columns by using the Array.slice() operation. This way, we don't need to know what types are being compared, because we'll still use .join() to form a string for the comparison:

if(row.slice(0,2).join() == newData[j].slice(0,2).join()){
  duplicate = true;
}

Sorting

Another issue you'll run into is your placement of range.sort(). It's dangerous to sort before removing duplicates, because you can't guarantee the order of newly-retrieved event data vs data for the same event that you already had which may have additional columns of information. As a result, the duplicate removal could erase your user-entered info. It's safer to leave sorting to be a final step, or include additional columns in the sort to guarantee order.

Another little issue: The use of .getRange("A3:F2000") has a side-effect of expanding your spreadsheet to 2000 lines; you could instead use .getRange("A3:F"), which will retrieve the maximum range without expanding it.

But I suggest you do the sorting entirely with javascript arrays, which will be much faster than using spreadsheet services. I'm assuming you have two lines of header information that you want to retain at the top of the sheet, since you were sorting from A3. Here's a way to do the sort:

// sort ascending by date - retain header lines
var headers = newData.slice(0,2);
var sorted = newData.slice(2).sort(sortFunction);
var newData = headers.concat(sorted);

Where sortFunction() is:

function sortFunction( a, b ) {
  // coerce dates to numbers and return comparison
  return ((+a[0]) - (+b[0]))
}

Final Script

Here's what you get, with the above changes.

function importEvents(){
  var calID = Browser.inputBox("Please enter your google Cal ID", Browser.Buttons.OK_CANCEL);
  var startdate = Browser.inputBox("Start date using 1/1/2013 format", Browser.Buttons.OK_CANCEL);
  var cal = CalendarApp.getCalendarById(calID);
  var events_sheet = SpreadsheetApp.getActiveSheet();
  var events = cal.getEvents(new Date(startdate), new Date("1/1/2014"));
  var lr = events_sheet.getLastRow();
  var eventarray = new Array();

  var i = 0; // edited
  for (i = 0; i < events.length; i++) {
    line = new Array();
    line.push(events[i].getStartTime());
    line.push(events[i].getTitle());

     //Potential more data that I am not getting at this time
     // line.push(events[i].getDescription());
     // line.push(events[i].getEndTime());

    eventarray.push(line);
  }

  // Append the retreived events to existing spreadsheet
  events_sheet.getRange("A"+(lr+1)+":B"+(lr+i)).setValues(eventarray);

  //remove duplicate rows
  var sheet = SpreadsheetApp.getActiveSheet();
  var data = sheet.getDataRange().getValues();
  var newData = new Array();
  for(i in data){
    var row = data[i];
    var duplicate = false;
    for(j in newData){

      if(row.slice(0,2).join() == newData[j].slice(0,2).join()){
        duplicate = true;
      }
    }
    if(!duplicate){
      newData.push(row);
    }
  }

  // sortFunction used to compare rows of data in our newData array
  function sortFunction( a, b ) {
    // coerce dates to numbers and return comparison
    return ((+a[0]) - (+b[0]))
  }

  // sort ascending by date - retain header lines
  var headers = newData.slice(0,2);
  var sorted = newData.slice(2).sort(sortFunction);
  var newData = headers.concat(sorted);

  // Clear the existing info and update with newData.
  sheet.clearContents();
  sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}

Upvotes: 3

Related Questions