Reputation: 1940
How can I clean a column using Hive HQL containing this type of string
["A","B","C"]
["ABC","RFG","ERD","BAC"]
in order to get only the values inside " "
separated by the ,
?
Desired output
A,B,C
ABC,RFG,ERD,BAC
I've tried to use regexp_replace
:
regexp_replace(s_agr.sig_tecnologia,'["\]\[]','')
but I'm receiving this error below:
Error: Error while compiling statement: FAILED: SemanticException [Error 10014]: Line 53:1 Wrong arguments '''': No matching method for class org.apache.hadoop.hive.ql.udf.UDFRegExpReplace with (array<string>, string, string). Possible choices: _FUNC_(string, string, string) (state=42000,code=10014)
Upvotes: 1
Views: 1697
Reputation: 37473
You can try the below -
select regexp_replace(s_agr.sig_tecnologia, '\\[\\"\\]', '')
Upvotes: 1
Reputation: 1940
I've solved the problem with this question here.
The problem is because I was using collect_set
to generate all those values together and they were of type array of strings
not strings
.
If I then use concat_ws(',',collect_set(column))
my problem is solved.
Upvotes: 2
Reputation: 1269823
How about replace()
?
select replace(replace(replace(col, '"', ''), '[', ''), ']', '')
Upvotes: 3