Reputation: 132
Firstly - Thibaut, thank you for Kiba. It goes toe-to-toe with 'enterprise' grade ETL tools and has never let me down.
I'm busy building an ETL pipeline that takes a numbers of rows, and reduces them down into a single summary row. I get the feeling that this should be a simple thing, but I'm a little stumped on how to approach this problem.
We have a number of CDR's from a voice switch, and need to condense them under some simple criteria into a a handful of summary records. So, the problem is; I have many thousands of records coming in from a Source
, and need to transform them into only a few records based on some reduce criteria.
Kiba is really simple when there's a one-to-one Source
-> Destination
ETL, or even a one-to-many Source
-> Destination
with the new enumerable exploder in V3, but I don't see a clear path to many-to-one ETL pipelines.
Any suggestions or guidance would be greatly appreciated.
Upvotes: 2
Views: 130
Reputation: 8873
Glad you find Kiba useful! There are various solutions to this use case.
I'm making some assumptions here (if these are incorrect, the solutions will exist, but be different, e.g. boundaries detections & external storage):
My advice here is to leverage Kiba v3 ability to yield record in transform's close
method (described in more depth in this article):
class InMemoryReduceTransform
attr_reader :buffer, :summarize_cb
def initialize(summarize_cb:)
@buffer = []
@summarize_cb = summarize_cb
end
def process(row)
buffer << row
nil # do not forward the row to the rest of the pipeline
end
def close
summarize_cb(buffer).each do |row|
yield row
end
end
end
In essence, you'll just stack up the input rows, until the source is out of data, at which point the close
method will be called, and then you summarise the data you have and yield N summary rows.
Note: this is a simplistic implementation to put you on the right track. The next iteration of Kiba Pro will include a more scalable & generic version of this is (with vendor support). Please reach out if you are interested in it!
Let me know if this properly answers your question!
Upvotes: 2