Reputation: 31

How to install sqoop in Amazon EMR?

I've created a cluster in Amazon EMR and using -emr-4.0.0. Hadoop distribution:Amazon 2.6.0 and Hive 1.0.0. Need to install Sqoop so that I can communicate between Hive and Redshift? What are the steps to install Sqoop in EMR cluster? Requesting to provide the steps. Thank You!

Upvotes: 1

Answers (3)

Sayat Satybald

Reputation: 6580

Note that from Emr-4.4.0 AWS added support for Sqoop 1.4.6 to the EMR cluster. Installation is done with couple clicks on setup. No need for manual installation.

References:

Upvotes: 1

Ana Todor

Reputation: 801

Note that in EMR 4.0.0 hadoop fs -copyToLocal will throw errors.

Use aws s3 cp instead.

To be more specific than Amal:

Download the latest version of SQOOP and upload it to an S3 location. I am using sqoop-1.4.4.bin__hadoop-2.0.4-alpha and it seems to work just fine with EMR 4.0.0
Download the JAR connector for Redshift and upload it to same S3 location. This page might help.

Upload a script similar to the one below to S3

#!/bin/bash
# Install sqoop and mysql connector. Store in s3 and load
# as bootstrap step.

bucket_location='s3://your-sqoop-jars-location/'
sqoop_jar='sqoop-1.4.4.bin__hadoop-2.0.4-alpha'
sqoop_jar_gz=$sqoop_jar.tar.gz
redshift_jar='RedshiftJDBC41-1.1.7.1007.jar'

cd /home/hadoop

aws s3 cp $bucket_location$sqoop_jar_gz .
tar -xzf $sqoop_jar_gz
aws s3 cp $bucket_location$redshift_jar .
cp $redshift_jar $sqoop_jar/lib/

Set SQOOP_HOME and add SQOOP_HOME to the PATH to be able to call sqoop from anywhere. These entries should be made in /etc/bashrc. Otherwise you will have to use the full path, in this case: /home/hadoop/sqoop-1.4.4.bin__hadoop-2.0.4-alpha/bin/sqoop

I am using Java to programatically launch my EMR cluster. To configure bootstrap steps in Java I create a BootstrapActionConfigFactory:

public final class BootstrapActionConfigFactory {
    private static final String bucket = Config.getBootstrapBucket();

    // make class non-instantiable
    private BootstrapActionConfigFactory() {
    }

    /**
     * Adds an install Sqoop step to the job that corresponds to the version set in the Config class.
     */
    public static BootstrapActionConfig newInstallSqoopBootstrapActionConfig() {
        return newInstallSqoopBootstrapActionConfig(Config.getHadoopVersion().charAt(0));
    }

    /**
     * Adds an install Sqoop step to the job that corresponds to the version specified in the parameter
     *
     * @param hadoopVersion the main version number for Hadoop. E.g.: 1, 2
     */
    public static BootstrapActionConfig newInstallSqoopBootstrapActionConfig(char hadoopVersion) {
        return new BootstrapActionConfig().withName("Install Sqoop")
            .withScriptBootstrapAction(
                new ScriptBootstrapActionConfig().withPath("s3://" + bucket + "/sqoop-tools/hadoop" + hadoopVersion + "/bootstrap-sqoop-emr4.sh"));
    }
}

Then when creating the job:

Job job = new Job(Region.getRegion(Regions.US_EAST_1));
 job.addBootstrapAction(BootstrapActionConfigFactory.newInstallSqoopBootstrapActionConfig());

Upvotes: 7

Amal G Jose

Reputation: 2546

Download the tarball of sqoop and keep it in an s3 bucket. Create a bootstrap script that performs the following activity

Download the sqoop tarball to the required instances
extract the tarball
set SQOOP_HOME and add SQOOP_HOME to the PATH. These entries should be made in /etc/bashrc
Add the required connector jars to the lib of SQOOP.

Keep this script in S3 and point this script in the bootstrap actions.

Upvotes: 1

How to install sqoop in Amazon EMR?

Answers (3)

Related Questions