Reputation: 1449
I need to harvest tables and column names from AWS Glue crawler metadata catalogue. I used boto3
but constantly getting number of 100 tables even though there are more. Setting up NextToken
doesn't help. Please help if possible.
Desired results is list as follows:
lst = [table_one.col_one, table_one.col_two, table_two.col_one....table_n.col_n]
def harvest_aws_crawler():
glue = boto3.client('glue', region_name='')
response = glue.get_tables(DatabaseName='', NextToken = '')
#response syntax:
#https://boto3.amazonaws.com/v1/documentation/api/1.9.42/reference/services/glue.html#Glue.Client.get_tables
crawler_list_tables = []
for tables in response['TableList']:
while (response.get('NextToken') is not None):
crawler_list_tables.append(tables['Name'])
break
print(len(crawler_list_tables))
harvest_aws_crawler()
UPDATED code, still need to have tablename+columnname:
def harvest_aws_crawler():
glue = boto3.client('glue', region_name='')
next_token = ""
#response syntax:
#https://boto3.amazonaws.com/v1/documentation/api/1.9.42/reference/services/glue.html#Glue.Client.get_tables
response = glue.get_tables(DatabaseName='', NextToken = next_token)
tables_from_crawler = []
while True:
table_list = response['TableList']
for table_dict in table_list:
table_name = table_dict['Name']
#append table_name+column_name
for columns in table_name['StorageDescriptor']['Columns']:
tables_from_crawler.append(table_name + '.' + columns['Name'])
#tables_from_crawler.append(table_name)
next_token = response.get('NextToken')
if next_token is None:
break
print(tables_from_crawler)
harvest_aws_crawler()
Upvotes: 2
Views: 13085
Reputation: 41
You should use MaxResults
response = glue.get_tables(DatabaseName='', NextToken = '', MaxResults = number_that_greater_than_your_actual_tables)
Upvotes: 0
Reputation: 1
tables = list(dynamodb_resource.tables.all())
worked for me. And if I need only names in my script, additionally I use table_name = tables_names[x].name
Upvotes: 0
Reputation: 1449
Adding sub-loop did the trick to get table+column result.
#harvest aws crawler metadata
next_token = ""
client = boto3.client('glue',region_name='us-east-1')
crawler_tables = []
while True:
response = client.get_tables(DatabaseName = '', NextToken = next_token)
for tables in response['TableList']:
for columns in tables['StorageDescriptor']['Columns']:
crawler_tables.append(tables['Name'] + '.' + columns['Name'])
next_token = response.get('NextToken')
if next_token is None:
break
print(crawler_tables)
Upvotes: 2
Reputation: 9464
You can try the below approach by using the paginator
option:
def get_tables_for_database(database):
starting_token = None
next_page = True
tables = []
while next_page:
paginator = glue_client.get_paginator(operation_name="get_tables")
response_iterator = paginator.paginate(
DatabaseName=database,
PaginationConfig={"PageSize": 100, "StartingToken": starting_token},
)
for elem in response_iterator:
tables += [
{
"name": table["Name"],
}
for table in elem["TableList"]
]
try:
starting_token = elem["NextToken"]
except:
next_page = False
return tables
and then do invoke the method to list out the tables for a given database:
for table in get_tables_for_database(database):
print(f"Table: {table['name']}")
If you want to list the tables for every database out there in Glue, you may have to do an additional for loop in order to retrieve the databases first, and then extract the tables using the above snippet as your inner loop for each database.
Upvotes: 2