Reputation: 279
I have a log file with entries like this
10.28 INFO [EFKLogger] - POGUpdateTenestenerServiceImpl: Entering listener with object 624866045533
Now using Spark i want to count the number of queues getting hit every hour . Queue is POGUpdateTenestenerServiceImpl
. Now i want a JAVARDD that only contains the time and the queue so i can perform operation on it . I am new top spark and only found ways to either create RDD with all words or as a whole line . I only want two words from a line . HOw can i achieve this
Upvotes: 0
Views: 69
Reputation: 3316
You should use the textFile
function of the SparkContext to read the file:
Here is a Scala example, it can be translated easily to java
val text = sc.textFile("data.csv") //Read the file
val words = text.map(line=> line.split(" ")) //Break the line to words
Now words is an array of words, you can take the first second and do whatever you want with them.
Upvotes: 1