Reputation: 103
Can we use java for ETL in AWS Glue? It seems like there is only two option for Glue ETL programming i.e. Python and Scala.
Upvotes: 3
Views: 7938
Reputation: 170
A bit late to the answer but hopefully that helps out someone from now on.
You actually can run Java code but not by supplying the source code in the "Script" section like you can with Python or Scala.
The Glue environment contains a JRE (currently fixed at version 1.8) in order to be able to run Scala as it is a JVM-based language. To achieve this you will need to ship your code as a .jar
and find a way to invoke it.
In our case we use Python to trigger a sub-process like:
import sys
import json
import boto3
import subprocess
...
x = subprocess.run([
'java', '-jar', '<your_jar>.jar', '--foo=bar'
])
print('Response code: ' + str(x.returncode))
if x.returncode != 0:
raise Exception(f"Glue job failed with exit code: {x.returncode}")
Now you'll ask, how do I get access to my .jar
? One answer is S3. At the Advanced properties
section (expand), there is a Libraries
section as seen below:
There, just add the fully qualified S3 path to your .jar
, to the Dependent JARs path
section and as seen above. The runtime path for a Glue instance (3.0, 4.0 and at the moment of writing) is /tmp
and that's the path the .jar
is copied to while initializing the instance. That's why you can execute it implicitly pointing to ./
.
From experience, performance is not bad at all (no different to an EC2 instance running the same .jar
) but you may need to tweak some things on the spawned JVM to get better results. We're using SpringBoot in headless mode with command-line enhancing modules and just works great.
Edit: We had decided initially to wrap around Python as we weren't sure how to trigger the SpringBoot application from Scala. Digging into the Jar internals of the standalone SpringBoot we figured out that the entry point is the JarLauncher class as defined in the Manifest file. To trigger a SpringBoot standalone jar, include it to the classpath as explained above, switch the Glue Job's language to Scala 2.0 and include this Scala snippet:
import org.springframework.boot.loader.JarLauncher
object DemoApp {
def main(args: Array[String]): Unit = {
JarLauncher.main(args)
}
}
Note: our SpringBoot dependencies are fixed to 2.5.9 that are Java 8 - based.
Upvotes: 0
Reputation: 606
No
Q: What programming language can I use to write my ETL code for AWS Glue?
You can use either Scala or Python.
Resource: AWS Glue FAQ
Upvotes: 4