Gladiator
Gladiator

Reputation: 644

How to segregate lines which match my condition in Scala?

I am trying to rewrite a SQL query in Scala.

  1. The file is pipe-separated.
  2. The field Message is present in the 4th column of the file.
  3. msg as shown below in the query is present in the 3rd column of Message, which is a CSV (MESSAGE >>>).

Sample file data:

[06-26 00:01:52,036] | Container : 5 | INFO  | relation ID: 00002ZaaaaaaXdsZb:-1:55609051-1879-4be8-b1c9-1d2006b17135, Message: acadeontroller.java recordLogRequest - 50 (...)     , MESSAGE >>>  API - XX_XX_XX {CHECKSUM=9ABF5975467E394F54442FBD4F6473D3,MEMBER_TYPE=}

The query looks like the below:

INSERT OVERWRITE TABLE staging.cleaned_data_7
SELECT * FROM staging.cleaned_data_6
WHERE msg  NOT LIKE '%KEEP_ALIVE%'
AND msg  NOT LIKE '%XXX_CHANNEL_SERVICE%'
AND msg  NOT LIKE '%XXX Finished%'
AND msg  NOT LIKE '%API -%'
;

I tried two ways. The first way is to use map and filter, which would not be able to extract the entire record which matches the case. I can only extract the Message field. Since its a SELECT * query, I can't use this.

val sample = sc.textFile("file:////home/user/sample.txt").map(x=>x.split('|')(3)).map(x=>x.split(',')(2))
val myFilter = sample.filter(x =>
  !(x contains "KEEP_ALIVE") && 
  !(x contains "XXX_CHANNEL_SERVICE") && 
  !(x contains "XXX Finished") && 
  !(x contains "API -") )

Method two: I am using the partition function. But I'm facing an error.

val (valid,invalid) = readFile.partition{ line=>
  val Message = line.split('|')(3).split(',')(2).toString

  Message.filter(x =>
    !(x contains "KEEP_ALIVE") && 
    !(x contains "XXX_CHANNEL_SERVICE") && 
    !(x contains "XXX Finished") && 
    !(x contains "API -")
  )
}

<console>:48: error: value contains is not a member of Char

Upvotes: 1

Views: 131

Answers (2)

Vasiliy Ivashin
Vasiliy Ivashin

Reputation: 425

Try doing the split inside filter, like this:

val skippedMessages = List("KEEP_ALIVE", "XXX_CHANNEL_SERVICE", "XXX Finished", "API -")
val result = sample.filter { line =>
  val message = line.split('|')(3).split(',')(2)
  !skippedMessages.exists(message.contains)
}

Upvotes: 2

jwvh
jwvh

Reputation: 51271

After this statement: val message = line.split('|')(3).split(',')(2).toString, the variable message is a String.

When you filter() on a String you are extracting individual Char elements and filtering which Chars to keep and which ones to leave out.

Also, the partition() method requires a Boolean result, which filter() doesn't supply.

Try this and see if it gets you closer.

val (valid,invalid) = readFile.partition{ line=>
  val message = line.split('|')(3).split(',')(2).toString

  !(message contains "KEEP_ALIVE") && 
  !(message contains "XXX_CHANNEL_SERVICE") && 
  !(message contains "XXX Finished") && 
  !(message contains "API -")
}

Upvotes: 1

Related Questions