Reputation: 644
I am trying to rewrite a SQL query in Scala.
Message
is present in the 4th column of the file.msg
as shown below in the query is present in the 3rd column of Message
, which is a CSV (MESSAGE >>>
).Sample file data:
[06-26 00:01:52,036] | Container : 5 | INFO | relation ID: 00002ZaaaaaaXdsZb:-1:55609051-1879-4be8-b1c9-1d2006b17135, Message: acadeontroller.java recordLogRequest - 50 (...) , MESSAGE >>> API - XX_XX_XX {CHECKSUM=9ABF5975467E394F54442FBD4F6473D3,MEMBER_TYPE=}
The query looks like the below:
INSERT OVERWRITE TABLE staging.cleaned_data_7 SELECT * FROM staging.cleaned_data_6 WHERE msg NOT LIKE '%KEEP_ALIVE%' AND msg NOT LIKE '%XXX_CHANNEL_SERVICE%' AND msg NOT LIKE '%XXX Finished%' AND msg NOT LIKE '%API -%' ;
I tried two ways. The first way is to use map
and filter
, which would not be able to extract the entire record which matches the case. I can only extract the Message
field. Since its a SELECT *
query, I can't use this.
val sample = sc.textFile("file:////home/user/sample.txt").map(x=>x.split('|')(3)).map(x=>x.split(',')(2))
val myFilter = sample.filter(x =>
!(x contains "KEEP_ALIVE") &&
!(x contains "XXX_CHANNEL_SERVICE") &&
!(x contains "XXX Finished") &&
!(x contains "API -") )
Method two: I am using the partition
function. But I'm facing an error.
val (valid,invalid) = readFile.partition{ line=>
val Message = line.split('|')(3).split(',')(2).toString
Message.filter(x =>
!(x contains "KEEP_ALIVE") &&
!(x contains "XXX_CHANNEL_SERVICE") &&
!(x contains "XXX Finished") &&
!(x contains "API -")
)
}
<console>:48: error: value contains is not a member of Char
Upvotes: 1
Views: 131
Reputation: 425
Try doing the split inside filter, like this:
val skippedMessages = List("KEEP_ALIVE", "XXX_CHANNEL_SERVICE", "XXX Finished", "API -")
val result = sample.filter { line =>
val message = line.split('|')(3).split(',')(2)
!skippedMessages.exists(message.contains)
}
Upvotes: 2
Reputation: 51271
After this statement: val message = line.split('|')(3).split(',')(2).toString
, the variable message
is a String
.
When you filter()
on a String
you are extracting individual Char
elements and filtering which Char
s to keep and which ones to leave out.
Also, the partition()
method requires a Boolean
result, which filter()
doesn't supply.
Try this and see if it gets you closer.
val (valid,invalid) = readFile.partition{ line=>
val message = line.split('|')(3).split(',')(2).toString
!(message contains "KEEP_ALIVE") &&
!(message contains "XXX_CHANNEL_SERVICE") &&
!(message contains "XXX Finished") &&
!(message contains "API -")
}
Upvotes: 1