Reputation: 11
I am trying to convert the below Hive SQL statement into Spark dataframe and getting the error.
trim(regexp_extract(message_comment_txt, '(^.*paid\\s?\\$?)(.*?)(\\s?toward.*)', 2))
Sample data: message_comment_txt = "DAY READER, paid 12.76 toward the cost"
I need to get the output as 12.76
Please help me to provide equivalent spark dataframe statement.
Upvotes: 1
Views: 2305
Reputation: 31490
Try with paid\\s+(.*?)\\s+toward
regex.
df.withColumn("extract",regexp_extract(col("message_comment_txt"),"paid\\s+(.*?)\\s+toward",1)).show(false)
//for case insensitive
df.withColumn("extract",regexp_extract(col("message_comment_txt"),"(?i)paid\\s+(.*?)\\s+(?i)toward",1)).show(false)
//+--------------------------------------+-------+
//|message_comment_txt |extract|
//+--------------------------------------+-------+
//|DAY READER, paid 12.76 toward the cost|12.76 |
//+--------------------------------------+-------+
Upvotes: 2