Reputation: 3080
I'm running Kiba ETL pipeline in a rails background job. I'd like to provide some status to the user while the job is running. What would be the best way to achieve this?
Can I use some variable somehow?
Or should I save the status update in the database after every step (once in source, once for every transform, once in destination)? Once for every transformation seems like a lot of additional db writing and also, it seems a bit "dirty" to talk to the database from transform.
Thanks!
Upvotes: 1
Views: 323
Reputation: 8873
To implement that type of use-case, you have to incorporate some form of progress tracking in your job.
It could report to a database record (which would modelize the job - recommended if you are doing a bit heavy-weight imports and want to be able to search afterwards), but you can also report to some form of pub-sub system (redis, Postgres, ActionCable...) if you want something more instant & more lightweight.
A transform is actually a great place to track progress, but this does not mean you have to report at every single row (because it would cause a SQL write at each row, which is usually too much!).
What I recommend is to report the progress only every N rows, using code like this:
pre_process do
@count ||= 0
end
transform do |r|
@count += 1
if @count % 500 == 0
# TODO here: notify the report system
end
r
end
You will want to think about what happens if an error occurs while you are notifying the report system: maybe you want to halt everything, or maybe you want to continue.
Make sure also to track the beginning of the job, the end of the job (success/error/completeness) to make sure you don't end up with stale jobs.
It seems a bit "dirty" to talk to the database, but only because we are mixing concerns a bit. If you do it every N rows & make sure not to pollute the main system, it's perfectly fine!
Upvotes: 2