Ash
Ash

Reputation: 1456

Hadoop command to run bash script in hadoop cluster

I've a shell script (count.sh) which counts the number of lines in a file. This script has been copied into hdfs and am currently using Oozie workflow to execute this script.

However, I was wondering if there is a way to execute this shell script from command line.

Ex:

In unix: [myuser@myserver ~]$./count.sh

Equivalent of this when count.sh is in hadoop cluster location '/user/cloudera/myscripts/count.sh'.

I read this Hadoop command to run bash script in hadoop cluster, but am still unclear.

Upvotes: 1

Views: 11508

Answers (3)

Lars Gustafsson
Lars Gustafsson

Reputation: 41

I know this is an old post, but I just came across it myself and figured I could add a bit of info to it for the future.

Like Camille said, but this also works with parameters, for instance if you are using bash.

hdfs dfs -cat /path/file | exec bash -s param1 param2 param3 param4

By gathering the file with cat, you can start the script as a bash file with execute. -s enables parameters.

Upvotes: 3

Camille
Camille

Reputation: 11

hadoop fs -cat /path/count.sh|exec sh

Upvotes: 0

tk421
tk421

Reputation: 5947

What you're looking for is called Hadoop streaming.

You can look at the official documentation Hadoop Streaming to find out more or look at Writing An Hadoop MapReduce Program In Python (instead of python, put in your bash script) to understand how to use it.

Upvotes: 1

Related Questions