Reputation: 345
Good day all,
Background story: I have a report in a CSV file emailed to me daily. I'm trying to create a script in GAS to download the CSV file in Google Drive and then upload it into a BigQuery table. GAS is the only scripting tool we have enabled so I'm stuck with it. I'm new both to JavaScript in general as well as BigQuery API
code:
function testFunction() {
var file = DriveApp.getFilesByName("my_csv_file.csv")
var csv = file.next().getBlob().setContentType('application/octet-stream').getDataAsString();
var csvData = Utilities.parseCsv(csv);
// I need to do the following to get the yyyy-MM-dd format
for (var i = 1; i < csvData.length; i++) {
var csvdate = csvData[i][1];
csvData[i][1] = csvdate.substring(6, 10) + "-" + csvdate.substring(3, 5) + "-" + csvdate.substring(0, 2);
}
var csvString = csvData.join("\n");
var blob = Utilities.newBlob(csvString, "text/csv")
var data = blob.setContentType('application/octet-stream');
var projectId = 'my_project_id';
var datasetId = 'my_dataset';
var tableId = 'bigquery_table';
var job = {
configuration: {
load: {
destinationTable: {
projectId: projectId,
datasetId: datasetId,
tableId: tableId
},
source_format: 'CSV',
skipLeadingRows: 1,
allowJaggedRows: 'TRUE',
allow_quoted_newlines: 'TRUE',
}
}
};
job = BigQuery.Jobs.insert(job, projectId, data);
}
And the job error I'm getting:
Error encountered during job execution:
Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1290; errors: 1. Please look into the errors[] collection for more details.
Failure details:
- Error while reading data, error message: Too many values in row
starting at position: 239165.
- You are loading data without specifying data format, data will be
treated as CSV format by default. If this is not what you mean,
please specify data format by --source_format.
The thing I don't get: I am specifying the source_format - am I doing it wrong?
Where I found the problem is: There are too many values in some rows because there are too many columns. There are too many columns because some of the product descriptions have bleeding commas in them. The original CSV file has all cells encapsulated in quotation marks - nice way to get around the problem I guess. The thing is I need to change the format of the day column in order to BigQuery accept is as a date and by doing so, I seem to erase all quotation marks...
Any pointers on how I could fix it, please?
Upvotes: 0
Views: 1507
Reputation: 345
It seems that I've found a solution. The additional commas appear only in one column, so I used this loop:
for (var i = 1; i < csvData.length; i++) {
var csvdate = csvData[i][1];
csvData[i][1] = csvdate.substring(6, 10) + "-" + csvdate.substring(3, 5) + "-" + csvdate.substring(0, 2);
}
to add an additional step:
for (var i = 1; i < csvData.length; i++) {
var csvdate = csvData[i][1];
csvData[i][1] = csvdate.substring(6, 10) + "-" + csvdate.substring(3, 5) + "-" + csvdate.substring(0, 2);
var csvdesc = csvData[i][4];
csvData[i][4] = csvdesc.replace(/([,])+/g, "") ;
}
which removes all commas from the column. Phew!
Upvotes: 1
Reputation: 50799
The original CSV file has all cells encapsulated in quotation marks
Then, do the same.
var csvString = csvData.map(function (row){
return '"' + row.join('","') + '"';
}).join('\n')
Upvotes: 0