Reputation: 465
I have multiple csv files stored in gcs, I want to load them to bigquery using cloud run.
The problem I don't know the schema and the schema is variable always change also I don't want to use autodetect option when load files. I want to load csv files to bigquery using bigquery api loadconfig without schema and autodetect=False, all columns considered of type string.
is that possible ?
I tried to use pandas dataframe , but files are too large so always there is memory problems.
Upvotes: 0
Views: 1667
Reputation: 1
Use the following function to generate schema with all columns as STRING type.
def getschema(file_path):
'''Get schema from CSV with all columns as string'''
schema = []
with open(file_path, 'r') as read_obj:
# pass the file object to DictReader() to get the DictReader object
csv_dict_reader = DictReader(read_obj)
# get column names from a csv file
column_names = csv_dict_reader.fieldnames
for c in column_names:
schema.append(bigquery.SchemaField(c,"STRING"))
return schema
Upvotes: 0