CNR
CNR

Reputation: 11

regexp_extract in scala data frame is giving the error

I am trying to convert the below Hive SQL statement into Spark dataframe and getting the error.

trim(regexp_extract(message_comment_txt, '(^.*paid\\s?\\$?)(.*?)(\\s?toward.*)', 2))

Sample data: message_comment_txt = "DAY READER, paid 12.76 toward the cost"

I need to get the output as 12.76

Please help me to provide equivalent spark dataframe statement.

Upvotes: 1

Views: 2305

Answers (1)

notNull
notNull

Reputation: 31490

Try with paid\\s+(.*?)\\s+toward regex.

df.withColumn("extract",regexp_extract(col("message_comment_txt"),"paid\\s+(.*?)\\s+toward",1)).show(false)
//for case insensitive
df.withColumn("extract",regexp_extract(col("message_comment_txt"),"(?i)paid\\s+(.*?)\\s+(?i)toward",1)).show(false)
//+--------------------------------------+-------+
//|message_comment_txt                   |extract|
//+--------------------------------------+-------+
//|DAY READER, paid 12.76 toward the cost|12.76  |
//+--------------------------------------+-------+

Upvotes: 2

Related Questions