ErrHuman
ErrHuman

Reputation: 345

Uploading CSV file in Google Apps Script to BigQuery table - rogue commas in rows

Good day all,

Background story: I have a report in a CSV file emailed to me daily. I'm trying to create a script in GAS to download the CSV file in Google Drive and then upload it into a BigQuery table. GAS is the only scripting tool we have enabled so I'm stuck with it. I'm new both to JavaScript in general as well as BigQuery API

code:

function testFunction() {

  var file = DriveApp.getFilesByName("my_csv_file.csv")
  var csv = file.next().getBlob().setContentType('application/octet-stream').getDataAsString();
  var csvData = Utilities.parseCsv(csv);

  // I need to do the following to get the yyyy-MM-dd format

  for (var i = 1; i < csvData.length; i++) {
    var csvdate = csvData[i][1];
    csvData[i][1] = csvdate.substring(6, 10) + "-" + csvdate.substring(3, 5) + "-" + csvdate.substring(0, 2);
    }

  var csvString = csvData.join("\n");  

 var blob = Utilities.newBlob(csvString, "text/csv")
 var data = blob.setContentType('application/octet-stream');

  var projectId = 'my_project_id';
  var datasetId = 'my_dataset';
  var tableId   = 'bigquery_table';

  var job = {
    configuration: {
      load: {
        destinationTable: {
          projectId: projectId,
          datasetId: datasetId,
          tableId: tableId
        },
        source_format: 'CSV',
        skipLeadingRows: 1,
        allowJaggedRows: 'TRUE',
        allow_quoted_newlines: 'TRUE',
      }
    }
  };
  job = BigQuery.Jobs.insert(job, projectId, data);

}

And the job error I'm getting:

Error encountered during job execution:
Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1290; errors: 1. Please look into the errors[] collection for more details.
Failure details:
 - Error while reading data, error message: Too many values in row
   starting at position: 239165.
 - You are loading data without specifying data format, data will be
   treated as CSV format by default. If this is not what you mean,
   please specify data format by --source_format.

The thing I don't get: I am specifying the source_format - am I doing it wrong?

Where I found the problem is: There are too many values in some rows because there are too many columns. There are too many columns because some of the product descriptions have bleeding commas in them. The original CSV file has all cells encapsulated in quotation marks - nice way to get around the problem I guess. The thing is I need to change the format of the day column in order to BigQuery accept is as a date and by doing so, I seem to erase all quotation marks...

Any pointers on how I could fix it, please?

Upvotes: 0

Views: 1507

Answers (2)

ErrHuman
ErrHuman

Reputation: 345

It seems that I've found a solution. The additional commas appear only in one column, so I used this loop:

  for (var i = 1; i < csvData.length; i++) {
    var csvdate = csvData[i][1];
    csvData[i][1] = csvdate.substring(6, 10) + "-" + csvdate.substring(3, 5) + "-" + csvdate.substring(0, 2);
    }

to add an additional step:

  for (var i = 1; i < csvData.length; i++) {
    var csvdate = csvData[i][1];
    csvData[i][1] = csvdate.substring(6, 10) + "-" + csvdate.substring(3, 5) + "-" + csvdate.substring(0, 2);
    var csvdesc = csvData[i][4];
    csvData[i][4] = csvdesc.replace(/([,])+/g, "") ;
    }

which removes all commas from the column. Phew!

Upvotes: 1

TheMaster
TheMaster

Reputation: 50799

The original CSV file has all cells encapsulated in quotation marks

Then, do the same.

var csvString = csvData.map(function (row){
    return '"' + row.join('","') + '"';
}).join('\n')

Upvotes: 0

Related Questions