How to map a log file in Java using Spark?

Question

I have to monitor a log file in which is written the history of utilization of an app. This log file is formatted in this way:






... about 800000 rows

AppId is always the same, because is referenced at only one app, date is expressed in this format dd/mm/yyyy hh/mm cpuUsage and memoryUsage are expressed in % so for example:

<3ghffh3t482age20304,230720142245,0.2,3,5>

To be specific, I have to check the percentage of CPU usage and memory usage by this application to be monitored using spark and the map reduce algorithm.

My output is to print alert when the cpu or the memory are 100% of usage.

How can I start?

Sathish · Accepted Answer

The idea is to declare a class and map the line into a scala object,

Lets declare the case class as follows,

case class App(name: String, date: String, cpuUsage: Double, memoryusage: Double)

Then initialize the SparkContext and create a RDD from the text file where the data is present,

val sc = new SparkContext(sparkConf)
val inFile = sc.textFile("log.txt")

then parse each line and map it to App object so that the range checking would be faster,

val mappedLines = inFile.map(x => (x.split(",")(0), parse(x)))

where the parse(x) method is defined as follows,

 def parse(x: String):App = {
   val splitArr = x.split(",");
   val app = new App(splitArr(0),
                      splitArr(1),
                      splitArr(2).toDouble,
                      splitArr(3).toDouble)
   return app
}

Note that i have assumed the input as follows, (this is just to give you the idea and not the entire program),

ffh3t482age20304,230720142245,0.2,100.5

Then do the filter transformation where you can perform the check and report the anamoly conditions,

val anamolyLines = mappedLines.filter(doCheckCPUAndMemoryUtilization)
anamolyLines.count()

where doCheckCPUAndMemoryUtilization function is defined as follows,

def doCheckCPUAndMemoryUtilization(x:(String, App)):Boolean = {
    if(x._2.cpuUsage >= 100.0 ||
       x._2.memoryusage >= 100.0) {
       System.out.println("App name -> "+x._2.name +" exceed the limit")
       return true
    }

    return false
}

Note: This is only a batch processing and not real-time processing.

How to map a log file in Java using Spark?

Answers (1)

Related Questions