Reputation: 6592
I set up a Kafka consumer-producer system, and I need to process the transmitted messages. These are lines from a JSON file like
ConsumerRecord(topic=u'json_data103052', partition=0, offset=676, timestamp=1542710197257, timestamp_type=0, key=None, value='{"Name": "Simone", "Surname": "Zimbolli", "gender": "Other", "email": "[email protected]", "country": "Nigeria", "date": "11/07/2018"}', checksum=354265828, serialized_key_size=-1, serialized_value_size=189)
I am looking for an easy to implement solution to
Does anybody have suggestions on how to proceed? Thanks.
I am having issues using Spark, so I would prefer avoiding it. I am scripting in Python using Jupyter.
Here is my code:
from kafka import KafkaConsumer
from random import randint
from time import sleep
bootstrap_servers = ['localhost:9092']
%store -r topicName # Get the topic name from the kafka producer
print topicName
consumer = KafkaConsumer(bootstrap_servers = bootstrap_servers,
auto_offset_reset='earliest'
)
consumer.subscribe([topicName])
for message in consumer:
print (message)
Upvotes: 0
Views: 612
Reputation: 20840
For your scenario, Kafka Streams seems suitable. It has support of windowing with following 4 types :
Tumbling time window - Time-based Fixed-size, non-overlapping, gap-less windows
Hopping time window- Time-based Fixed-size, overlapping windows
Sliding time window- Time-based Fixed-size, overlapping windows that work on differences between record timestamps
Session window
For python, there is library : https://github.com/wintoncode/winton-kafka-streams
That can be useful for you.
Upvotes: 1
Reputation: 10065
Using Kafka Streams API is what you need I guess. You have all the features you need for windowing. You can find more info about Kafka Streams here:
https://kafka.apache.org/documentation/streams/
Upvotes: 1