Reputation: 67
I want to move the data from my local on-premises HDFS server to my Azure HDinsight cluster.
I tried distcp command but it does not understand the data lake storage path.
Upvotes: 0
Views: 3506
Reputation: 12788
Steps for Connecting on-premise Hadoop to ADLS:
Step1: Create the Azure Data Lake Store account.
Step2: Create the identity to access Azure Data Lake Store.
Step3: Modify the core-site.xml in your on-premise Hadoop cluster.
Step4: Test connectivity to Azure Data Lake Store from on-premise Hadoop.
Step5: Use DistCp to transfer the data from on-premise Hadoop to Azure Data Lake Store.
Syntax: hadoop distcp <HDFS_Path> <ADLS_PATH>
Example: hadoop distcp README.txt adl://mydatalakename.azuredatakestore.net/
For more details, refer "Connecting On-premise Hadoop to Azure Data Lake Store" and Migrate on-premise Apache Hadoop cluster to Azure HDInsight - data migration best practices.
Hope this helps.
Upvotes: 1