Using Kiba: Is it possible to define and run two pipelines in the same file? Using an intermediate destination & a second source

Question

My processing has a "condense" step before needing further processing:

Source: Raw event/analytics logs of various users.

Transform: Insert each row into a hash according to UserID.

Destination / Output: An in-memory hash like:

{ 
  "user1" => [event, event,...], 
  "user2" => [event, event,...] 
}

Now, I've got no need to store these user groups anywhere, I'd just like to carry on processing them. Is there a common pattern with Kiba for using an intermediate destination? E.g.

# First pass
source EventSource # 10,000 rows of single events
transform {|row| insert_into_user_hash(row)}
@users = Hash.new
destination UserDestination, users: @users

# Second pass
source UserSource, users: @users # 100 rows of grouped events, created in the previous step
transform {|row| analyse_user(row)}

I'm digging around the code and it appears that all transforms in a file are applied to the source, so I was wondering how other people have approached this, if at all. I could save to an intermediate store and run another ETL script, but was hoping for a cleaner way - we're planning lots of these "condense" steps.

Using Kiba: Is it possible to define and run two pipelines in the same file? Using an intermediate destination & a second source

Answers (1)

Related Questions

Using Kiba: Is it possible to define and run two pipelines in the same file? Using an intermediate destination &amp; a second source

Answers (1)

Related Questions

Using Kiba: Is it possible to define and run two pipelines in the same file? Using an intermediate destination & a second source