Reputation: 43
I am trying to import some data and getting the error
"Bad character (ASCII 0) encountered"
the file I'm trying to import is at http://commondatastorage.googleapis.com/snksales/dimdistributor.csv
not able to understand how to fix this.
Upvotes: 4
Views: 18109
Reputation: 17146
I tried powershell script but it was fast only for ~1 GB file size. For the 16GB file I had it just took forever and actually never finished.
I also had BOM characters in my UTF-8 format csv file I generated via BCP from SQL server.
So this is what I did for a very large csv file. Used 7zip utility to compress in gzip format followed by uploading to GCP cloud storage where I used gsutil command to unzip and remove any ascii 0 characters and save the file again.
gsutil cp gs://csvfiles/originalbaddatafile.csv.gz - | gunzip | tr -d '\000' | gsutil cp - gs://csvfiles/goodfile.csv
Upvotes: 1
Reputation: 477
Try cleaning the file with the following PowerShell script:
$configFiles = Get-ChildItem -Path C:\InputPath\*
foreach ($file in $configFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace "`0", "" } |
Set-Content $file.PSPath
}
For using it, just copy your file in C:\InputPath, save the script as Script.ps1 and run it with PowerShell
Upvotes: 2
Reputation: 26637
This file appears to be encoded as UTF-16. BigQuery only supports UTF-8 and latin1 text encodings. Can you reformat it as UTF-8 or ascii? If you are using windows, you should be able to set the encoding by doing a save-as in notepad. If you are using linux or mac, you should be able to do: iconv -f utf-16 -t utf-8 dimdistributor.csv -o dimdistributor_utf8.csv
. I ran the latter and was able to import your data.
$xxd dimdistributor.csv | head 0000000: fffe 3100 2c00 3000 3000 3000 3000 3100 ..1.,.0.0.0.0.1. 0000010: 3000 3000 3000 3000 3200 2c00 4d00 2e00 0.0.0.0.2.,.M... 0000020: 4d00 2e00 4500 4e00 5400 4500 5200 5000 M...E.N.T.E.R.P. 0000030: 5200 4900 5300 4500 5300 2c00 3200 0d00 R.I.S.E.S.,.2... 0000040: 0a00 3200 2c00 3000 3000 3000 3000 3100 ..2.,.0.0.0.0.1.
Upvotes: 9