Teejay
Teejay

Reputation: 111

How to upload a local CSV to google big query using python

I'm trying to upload a local CSV to google big query using python

def uploadCsvToGbq(self,table_name):


    load_config = {
    'destinationTable': {
    'projectId': self.project_id,
    'datasetId': self.dataset_id,
    'tableId': table_name
    }
    }

    load_config['schema'] = {
    'fields': [
    {'name':'full_name', 'type':'STRING'},
    {'name':'age', 'type':'INTEGER'},
    ]
    }
    load_config['sourceFormat'] = 'CSV'

    upload = MediaFileUpload('sample.csv',
                     mimetype='application/octet-stream',
                     # This enables resumable uploads.
                     resumable=True)
    start = time.time()
    job_id = 'job_%d' % start
    # Create the job.
    result = bigquery.jobs.insert(
    projectId=self.project_id,
    body={
    'jobReference': {
    'jobId': job_id
    },
    'configuration': {
    'load': load_config
    }
    },
    media_body=upload).execute()

    return result

when I run this it throws error like

"NameError: global name 'MediaFileUpload' is not defined"

whether any module is needed please help.

Upvotes: 4

Views: 6676

Answers (3)

Samihan Jawalkar
Samihan Jawalkar

Reputation: 113

One of easiest method to upload to csv file in GBQ is through pandas.Just import csv file to pandas (pd.read_csv()). Then from pandas to GBQ (df.to_gbq(full_table_id, project_id=project_id)).

import pandas as pd
import csv
df=pd.read_csv('/..localpath/filename.csv')
df.to_gbq(full_table_id, project_id=project_id)

Or you can use client api

from google.cloud import bigquery
import pandas as pd
df=pd.read_csv('/..localpath/filename.csv')
client = bigquery.Client()
dataset_ref = client.dataset('my_dataset')
table_ref = dataset_ref.table('new_table')
client.load_table_from_dataframe(df, table_ref).result()

Upvotes: 5

Guest5489
Guest5489

Reputation: 21

pip install --upgrade google-api-python-client

Then on top of your python file write:

from googleapiclient.http import MediaFileUpload

But care you miss some parenthesis. Better write:

result = bigquery.jobs().insert(projectId=PROJECT_ID, body={'jobReference': {'jobId': job_id},'configuration': {'load': load_config}}, media_body=upload).execute(num_retries=5)

And by the way, you are going to upload all your CSV rows, including the top one that defines columns.

Upvotes: 2

Related Questions