Reputation: 4665
I mean I know how to get data into kafka either by some file agent or programmatically using any of the clients, but speaking from architectural point of view...
It can't just be collecting HTTP logs.
I'm assuming when someone clicks a link or does something of interest, we can use some kind of ajax/javascript call to make a call to some microservice to capture the extra info that we want? But that's not always "reliable" per say, but do we care?
Or while the given "action" posts back to the server we simultaneously write to Kafka and perform the other action?
Upvotes: 0
Views: 1206
Reputation: 8335
It’s not clear from your question if you are trying to collect all the clickstream logs from a set of web servers, or if you are trying to selective publish some data to Kafka from your web app, so I will answer both.
The easiest way to collect every web click is to configure your web servers to use Syslog ( see http://archive.oreilly.com/pub/a/sysadmin/2006/10/12/httpd-syslog.html ) and configure your Syslog server to send data to Kafka (see https://www.balabit.com/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/configuring-destinations-kafka.html). Alternatively there are some more advanced features available in this Kafka Connector for Syslog-NG (see https://github.com/jcustenborder/kafka-connect-syslog). You can also write httpd logs to a file and use a Kafka File Connector to publish to Kafka (see https://docs.confluent.io/current/connect/connect-filestream/filestream_connector.html)
If you just want to enable your apps to send certain log data to Kafka directly you can use the Kafka REST Proxy and publish using a simple HTTP POST from either your client JavaScript or your server side logic (see https://docs.confluent.io/current/kafka-rest/docs/index.html)
Upvotes: 2