john
john

Reputation: 1133

How does crawler much better than direct connecting to db and retreive data?

In AWS Glue jobs, in order to retrieve data from DB or S3, we can get using 2 approaches. 1) Using Crawler 2) Using direct connection to DB or S3.

So my question is: How does crawler much better than direct connecting to a database and retrieve data?

Upvotes: 1

Views: 356

Answers (1)

Naveen
Naveen

Reputation: 99

AWS Glue Crawlers will not retrieve the actual data. Crawlers accesses your data stores and progresses through a prioritized list of classifiers to extract the schema of your data and other statistics, and then populates the Glue Data Catalog with this metadata. Crawlers can be scheduled to run periodically that will detect the availability of the new data along with the change to the existing data, including the table definition changes made by the data crawler. Crawlers automatically adds new table, new partitions to the existing table and the new versions of table definitions.

AWS Glue Data Catalog becomes a common metadata repository between Amazon Athena, Amazon Redshift Spectrum, Amazon S3. AWS Glue Crawlers helps in building this metadata repository.

Upvotes: 2

Related Questions