Michael Hawk
Michael Hawk

Reputation: 65

Google Apps Script - Remove duplicates in large spreadsheets

Using the basic tutorial code from Google works fine for removing duplicates in GoogleSheets. However, since the maximum execution time is 6minutes, I'm running into timedout issues with larger spreadsheets that have thousands of rows. How would I be able to modify this code to work with larger spreadsheets? For example, how would I be able to set a maximum amount of rows to iterate, starting from the bottom?

This is the code:

function removeDuplicates() {
  var sheet = SpreadsheetApp.getActiveSheet();
  var data = sheet.getDataRange().getValues();
  var newData = [];
  for (var i in data) {
    var row = data[i];
    var duplicate = false;
    for (var j in newData) {
      if (row.join() == newData[j].join()) {
        duplicate = true;
      }
    }
    if (!duplicate) {
      newData.push(row);
    }
  }
  sheet.clearContents();
  sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}

Upvotes: 4

Views: 3667

Answers (3)

Andrea Guerri
Andrea Guerri

Reputation: 29

@Ze Za just do it like this:

function removeDuplicates1() {
    const sheet = SpreadsheetApp.getActiveSheet();
    sheet.getDataRange().removeDuplicates([3]);
}

I don't understand then your goal with row 4 and 5 from your code

Upvotes: 0

Andrea Guerri
Andrea Guerri

Reputation: 29

If you want an alternative to removeDuplicates(), and have an array variable to use as you need, I've found this function useful:

 function removeDuplicates() {
  const sheet = SpreadsheetApp.getActiveSheet(),
        data = sheet.getDataRange().getValues();

  let newValue = data.map(JSON.stringify),
      newData = [...new Set(newValue)],
      unique = Array.from(newData, JSON.parse);

   sheet.getRange(1, 1, unique.length, unique[0].length).setValues(unique);
 }

Upvotes: 0

Tanaike
Tanaike

Reputation: 201513

I believe your goal as follows.

  • You want to delete the duplicated rows.
  • You want to achieve this using Google Apps Script.
  • You want to reduce the process cost of the script.

In this case, I thought that removeDuplicates() might be used for achieving your goal. The sample script is as follows.

Sample script:

function removeDuplicates() {
  SpreadsheetApp.getActiveSheet().getDataRange().removeDuplicates();
}

Or, in this case, Sheets API can be also used as follows. When above script was not useful for your situation, please test the following script. When Sheets API is used, the process cost can be reduced a little. Ref Before you use this, please enable Sheets API at Advanced Gooele services.

function removeDuplicates2() {
  const ss = SpreadsheetApp.getActiveSpreadsheet();
  const sheet = ss.getActiveSheet();
  const resource = { requests: [{ deleteDuplicates: { range: { sheetId: sheet.getSheetId() } } }] };
  Sheets.Spreadsheets.batchUpdate(resource, ss.getId());
}

References:

Upvotes: 3

Related Questions