CODEWITHSUNDEEP

google-app-enginecsvgogoogle-cloud-datastore

Reputation: 43

Importing and parsing a large CSV file with go and app engine's datastore

Locally I am successfully able to (in a task):

Open the csv
Scan through each line (using Scanner.Scan)
Map the parsed CSV line to my desired struct
Save the struct to datastore

I see that blobstore has a reader that would allow me toread the value directly using a streaming file-like interface. -- but that seems to have a limit of 32MB. I also see there's a bulk upload tool -- bulk_uploader.py -- but it won't do all the data-massaging I require and I'd like to limit writes (and really cost) of this bulk insert.

How would one effectively read and parse a very large (500mb+) csv file without the benefit of reading from local storage?

Upvotes: 3

Views: 2201

Answers (2)

Reputation: 43

Not a the solution I hoped for, but I ended up splitting the large files into 32MB pieces, uploading each to blob storage, then parsing each in a task.

It aint' pretty. But it took less time than the other options.

Upvotes: 1

Romin

Reputation: 8816

You will need to look at the following options and see if it works for you :

Looking at the large file size, you should consider using Google Cloud Storage for the file. You can use the command line utilities that GCS provides to upload your file to your bucket. Once uploaded, you can look at using the JSON API directly to work with the file and import it into your datastore layer. Take a look at the following: https://developers.google.com/storage/docs/json_api/v1/json-api-go-samples
If this is like a one time import of a large file, another option could be spinning up a Google Compute VM, writing an App there to read from GCS and pass on the data via smaller chunks to a Service running in App Engine Go, that can then accept and persist the data.

Upvotes: 2

Related Questions