Reputation: 136
I have to load txt files into oracle tables. Nowadays the process is being done using bash scripting, sql loader and command line tools for validations.
I'm trying to find more robust alternatives. The two options I came up with are Luigi (Python framework) and Spring Batch.
I made a little POC using Spring Batch, but I believe it has a lot of boilerplate code and might be an overkill. I also prefer Python over Java. The good thing about Batch is the job tracking schema that comes out of the box with the framework.
Files contain from 200k to 1kk records. No transformations are performed, only datatype and lenght validations. First steps of the job consist on checking header, trailer, some dates, making queries to parameters table and truncating the staging table.
Could you give me some pro and cons of each framework for this use case?
Upvotes: 2
Views: 3403
Reputation: 15330
I would argue they are not equivalent technologies. Luigi is more of a workflow/process management framework that can help organize and orchestrate many different batch job
The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop jobs, dumping data to/from databases, running machine learning algorithms, or anything else. https://luigi.readthedocs.io/en/stable/
Spring Batch gives you a reusable framework for structuring a batch job. It gives you a lot of things out of the box, like being able to read input from text files and write output to databases.
A lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems.
Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management https://spring.io/projects/spring-batch
You could theoretically run Spring Batch jobs with Luigi.
Based on the brief description of your use case, it sounds like the bread and butter of what inspired Spring Batch in the first place. In fact, their 15 minute demo application covers the use case of reading from a file and loading records into a JDBC database https://spring.io/guides/gs/batch-processing/.
Upvotes: 3