Failed to build spark2.4.3 against hadoop 3.2.0

Question

I'm building spark 2.4.3 to make it compatible to latest hadoop 3.2.0.

The source code is downloaded from https://www.apache.org/dyn/closer.lua/spark/spark-2.4.3/spark-2.4.3.tgz

Build command is ./build/mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.0 -DskipTests clean package

The build result is:

[INFO] Spark Project Parent POM ........................... SUCCESS [  1.761 s]
[INFO] Spark Project Tags ................................. SUCCESS [  1.221 s]
[INFO] Spark Project Sketch ............................... SUCCESS [  0.551 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  0.608 s]
[INFO] Spark Project Networking ........................... SUCCESS [  1.558 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  0.631 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  0.444 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  2.501 s]
[INFO] Spark Project Core ................................. SUCCESS [ 13.536 s]
[INFO] Spark Project ML Local Library ..................... SUCCESS [  0.549 s]
[INFO] Spark Project GraphX ............................... SUCCESS [  1.614 s]
[INFO] Spark Project Streaming ............................ SUCCESS [  3.332 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [ 14.271 s]
[INFO] Spark Project SQL .................................. SUCCESS [ 13.008 s]
[INFO] Spark Project ML Library ........................... SUCCESS [  7.923 s]
[INFO] Spark Project Tools ................................ SUCCESS [  0.187 s]
[INFO] Spark Project Hive ................................. SUCCESS [  6.664 s]
[INFO] Spark Project REPL ................................. SUCCESS [  1.285 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [  4.824 s]
[INFO] Spark Project YARN ................................. SUCCESS [  3.020 s]
[INFO] Spark Project Assembly ............................. SUCCESS [  1.558 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [  1.411 s]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [  1.573 s]
[INFO] Spark Project Examples ............................. SUCCESS [  1.702 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  5.969 s]
[INFO] Spark Avro ......................................... SUCCESS [  0.702 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:32 min
[INFO] Finished at: 2019-07-31T18:56:24+08:00
[INFO] ------------------------------------------------------------------------
[WARNING] The requested profile "hadoop-3.2" could not be activated because it does not exist.

According to my expectation, an all-in-one compress file like spark-2.4.3-bin-hadoop3.2.tgz would be generated under build directory, just like the binary file that can be downloaded from official site, https://www.apache.org/dyn/closer.lua/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz.

How can I remove the warning The requested profile "hadoop-3.2" could not be activated because it does not exist, what does it mean?

D3V · Accepted Answer

Caution: What you are trying to do could result in very unstable environment if you don't know what you are doing.

That being said, spark 2.4.x stable release does not have profile hadoop-3.2, it has hadoop-3.1.

You will need to pull code from master to achieve what you want to achieve.

If your sole intention is to make spark 2.4.3 compatible with hadoop 3.2, you could look at profile in master along with relevant changes and cherrypick those into your own workspace.

Failed to build spark2.4.3 against hadoop 3.2.0

Answers (1)

Related Questions