Reputation: 815
It's hard for me to understand the streaming table in Flink. I can understand Hive, map a fixed, static data file to a "table" but how to embody a table built on streaming data?
For example, every 1 second, 5 events with same structure are sent to a Kafka stream:
{"num":1, "value": "a"}
{"num":2, "value": "b"}
....
What does the dynamic table built on them look like? Flink consumes them all and store them somewhere (memory, local file, hdfs, etc.) then map to a table? Once the "transformmer" finishes processing these 5 events then clear the data and refill the "table" with 5 new events?
Any help is appreciated...
Upvotes: 0
Views: 443
Reputation: 43707
These dynamic tables don't necessarily exist anywhere -- it's simply an abstraction that may, or may not, be materialized, depending on the needs of the query being performed. For example, a query that is doing a simple projection
SELECT a, b FROM events
can be executed by simply streaming each record through a stateless Flink pipeline.
Also, Flink doesn't operate on mini-batches -- it processes each event one at a time. So there's no physical "table", or partial table, anywhere.
But some queries do require some state, perhaps very little, such as
SELECT count(*) FROM events
which needs nothing more than a single counter, while something like
SELECT key, count(*) FROM events GROUP BY key
will use Flink's key-partitioned state (a sharded key-value store) to persist the current counter for each key. Different nodes in the cluster will be responsible for handling events for different keys.
Just as "normal" SQL takes one or more tables as input, and produces a table as output, stream SQL takes one or streams as input, and produces a stream as output. For example, the SELECT count(*) FROM events
will produce the stream 1 2 3 4 5 ...
as its result.
There are some good introductions to Flink SQL on YouTube: https://www.google.com/search?q=flink+sql+hueske+walther, and there are training materials on github with slides and exercises: https://github.com/ververica/sql-training.
Upvotes: 2