Using Spark in while loop to process log files

Question

I have a server that generate some log files every 1 second and I want to process this file using Apache Spark.

I write a spark application using python and in a while loop I process a group of log files.

I stop sparkContext in each iteration and start it for next step.

My question is that what is the best approach for this kind of application that runs infinitely and process batches or group of generated files. should I use a infinite while loop or should I run my code in cron job or even scheduling frameworks like airflow?

Using Spark in while loop to process log files

Answers (1)

Related Questions