Reputation: 137
I am trying to establish a connection from AWS Glue to a remote server via SFTP using Python 3.7. I tried using the pysftp library for this task.
But pysftp uses a library named bcrypt that has python and c code. As of this moment, AWS Glue only supports pure python libraries as mentioned in the documentation (below link).
https://docs.aws.amazon.com/glue/latest/dg/console-custom-created.html
The error I am getting is as below.
ImportError: cannot import name '_bcrypt'
I am stuck here due to a compilation error.
Hence, I tried the JSch java library using Scala. There the compilation is successful, but I get the below exception.
com.jcraft.jsch.JSchException: java.net.UnknownHostException: [Remote Server Hostname]
How can we connect to a remote server via SFTP from AWS Glue? Is it possible?
How can we configure outbound rules (if required) for a Glue job?
Upvotes: 4
Views: 9921
Reputation: 81
AWS now has the SFTP Connector for Glue available.
The SFTP Connector for AWS Glue simplifies the process of connecting AWS Glue jobs to extract data from SFTP Storage , and also load data into SFTP Storage. This connector provides comprehensive access to SFTP Storage, facilitating cloud ETL processes for operational reporting, backup and disaster recovery, data governance, and more.
Upvotes: 0
Reputation: 1
i know that there is some time since this question was post, so i like to share some tools that could help you to get data from a sftp more easily and quickly. so for get a layer in a easy way use this tool https://github.com/aws-samples/aws-lambda-layer-builder, you can make a layer of pysftp faster and free of those annoying errors (cffi, bycrypt).
The lambda has a limit of 500 MB,so if you are trying to extract heavy files, the lambda will crash for this reason. to fix this you have to attach EFS (Elastic File System) to your lamdba https://docs.aws.amazon.com/lambda/latest/dg/services-efs.html
Upvotes: -1
Reputation: 137
I am answering my own question here for anyone whom this might help.
The straight answer is no.
I found the below resources which indicate that AWS Glue is an ETL tool for AWS resources.
AWS Glue uses other AWS services to orchestrate your ETL (extract, transform, and load) jobs to build a data warehouse.
Source - https://docs.aws.amazon.com/glue/latest/dg/how-it-works.html
Glue works well only with ETL from JDBC and S3 (CSV) data sources. In case you are looking to load data from other cloud applications, File Storage Base, etc. Glue would not be able to support.
Source - https://hevodata.com/blog/aws-glue-etl/
Hence to implement what I was working on, I used an AWS Lambda function to connect to the remote server via SFTP, pick the required files and drop them in an S3 bucket. The AWS Glue job can now pick the files from S3.
Upvotes: 7