AWS Glue - how to change column names in Glue Catalog table using BOTO3?

Question

I am using AWS Glue Crawlers to read from S3 zip files (without header) and populate Glue Catalog.

Columns are named by default: col_0, col_1...

How to change those column names using e.g. python boto3 module and interact with AWS Glue catalog directly?

Is there example snippet for doing this?

Thanks.

Ash_s94 · Accepted Answer

You can try pulling the tables and updating the names. Here is an example of what I would do.

First we'll try and retrieve the table:

    database_name = 'ENTER TABLE NAME'
    table_name = 'ENTER TABLE NAME'
    response = self.glue_client.get_table(DatabaseName=database_name,table_name=Name)
    old_table = response['Table']

Next we'll update the table with the values we want changed. The new table we create can only have certain fields in order for the update_table to accept it. So we'll do the following.

    field_names = [
      "Name",
      "Description",
      "Owner",
      "LastAccessTime",
      "LastAnalyzedTime",
      "Retention",
      "StorageDescriptor",
      "PartitionKeys",
      "ViewOriginalText",
      "ViewExpandedText",
      "TableType",
      "Parameters"
    ]
    new_table = dict()
    for key in field_names:
     if key in old_table:
      new_table[key] = old_table[key]

Now that we have the updated table, we can manipulate the column names. Here is an example of changing just 'col_0' to 'new_col'

    for col in new_table['StorageDescriptor']['Columns']:
      if col['Name'] == 'col_0':
        col['Name'] = 'new_col' 
    response=self.glue_client.update_table(DatabaseName=database_name,TableInput=new_table)

Hopefully this helps!

AWS Glue - how to change column names in Glue Catalog table using BOTO3?

Answers (1)

Related Questions