Reputation: 1004
I am using a Glue crawler to create a table in Athena for a set of CSV files being generated by an external provider. These files do not have headers, but instead come with a separate, one-line CSV file specifying the headers. There are more than 1000 columns, so manually editing the schema to name the columns from Glue's default col0, col1, col2 is a last resort. Is there a way to tell Glue/Athena to pick the column names from a separate file than the data?
Upvotes: 0
Views: 1285
Reputation: 1004
I was able to do this using boto3 with the update_table
method. Much of the solution is at AWS Glue - how to change column names in Glue Catalog table using BOTO3? which gives an example for renaming a single column. To rename all columns based on an external file, instead of the single-column method I used the following:
with open('column_headers.tsv') as cfile:
creader = csv.reader(cfile, delimiter='\t')
for row in creader:
colnames = row
old_colnames = [oc['Name'] for oc in old_table['StorageDescriptor']['Columns']]
col_map = dict(zip(old_colnames, colnames))
for col in new_table['StorageDescriptor']['Columns']:
col['Name'] = col_map[col['Name']]
client.update_table(DatabaseName=db_name, TableInput=new_table)
Upvotes: 0