Evan Root
Evan Root

Reputation: 279

Not able to parse a file using Java Spark API

I have a log file with entries like this

10.28 INFO  [EFKLogger] - POGUpdateTenestenerServiceImpl: Entering listener with object 624866045533

Now using Spark i want to count the number of queues getting hit every hour . Queue is POGUpdateTenestenerServiceImpl . Now i want a JAVARDD that only contains the time and the queue so i can perform operation on it . I am new top spark and only found ways to either create RDD with all words or as a whole line . I only want two words from a line . HOw can i achieve this

Upvotes: 0

Views: 69

Answers (1)

antonpuz
antonpuz

Reputation: 3316

You should use the textFile function of the SparkContext to read the file:

Here is a Scala example, it can be translated easily to java

val text = sc.textFile("data.csv") //Read the file
val words = text.map(line=> line.split(" ")) //Break the line to words

Now words is an array of words, you can take the first second and do whatever you want with them.

Upvotes: 1

Related Questions