user1548990
user1548990

Reputation: 43

Bad character (ASCII 0) encountered while importing data

I am trying to import some data and getting the error

"Bad character (ASCII 0) encountered"

the file I'm trying to import is at http://commondatastorage.googleapis.com/snksales/dimdistributor.csv

not able to understand how to fix this.

Upvotes: 4

Views: 18109

Answers (3)

DhruvJoshi
DhruvJoshi

Reputation: 17146

I tried powershell script but it was fast only for ~1 GB file size. For the 16GB file I had it just took forever and actually never finished.

I also had BOM characters in my UTF-8 format csv file I generated via BCP from SQL server.

So this is what I did for a very large csv file. Used 7zip utility to compress in gzip format followed by uploading to GCP cloud storage where I used gsutil command to unzip and remove any ascii 0 characters and save the file again.

gsutil cp gs://csvfiles/originalbaddatafile.csv.gz - | gunzip | tr -d '\000' | gsutil cp - gs://csvfiles/goodfile.csv

Upvotes: 1

Cristian Avendaño
Cristian Avendaño

Reputation: 477

Try cleaning the file with the following PowerShell script:

   $configFiles = Get-ChildItem -Path C:\InputPath\* 
   foreach ($file in $configFiles)
   {
         (Get-Content $file.PSPath) |
         Foreach-Object { $_ -replace "`0", "" } |
         Set-Content $file.PSPath
   }

For using it, just copy your file in C:\InputPath, save the script as Script.ps1 and run it with PowerShell

Upvotes: 2

Jordan Tigani
Jordan Tigani

Reputation: 26637

This file appears to be encoded as UTF-16. BigQuery only supports UTF-8 and latin1 text encodings. Can you reformat it as UTF-8 or ascii? If you are using windows, you should be able to set the encoding by doing a save-as in notepad. If you are using linux or mac, you should be able to do: iconv -f utf-16 -t utf-8 dimdistributor.csv -o dimdistributor_utf8.csv. I ran the latter and was able to import your data.

$xxd dimdistributor.csv | head 0000000: fffe 3100 2c00 3000 3000 3000 3000 3100 ..1.,.0.0.0.0.1. 0000010: 3000 3000 3000 3000 3200 2c00 4d00 2e00 0.0.0.0.2.,.M... 0000020: 4d00 2e00 4500 4e00 5400 4500 5200 5000 M...E.N.T.E.R.P. 0000030: 5200 4900 5300 4500 5300 2c00 3200 0d00 R.I.S.E.S.,.2... 0000040: 0a00 3200 2c00 3000 3000 3000 3000 3100 ..2.,.0.0.0.0.1.

Upvotes: 9

Related Questions