Henrique Branco
Henrique Branco

Reputation: 1940

Remove characters [ ] and ' with hive hql

How can I clean a column using Hive HQL containing this type of string

["A","B","C"]
["ABC","RFG","ERD","BAC"]

in order to get only the values inside " " separated by the ,?

Desired output

A,B,C
ABC,RFG,ERD,BAC

I've tried to use regexp_replace:

regexp_replace(s_agr.sig_tecnologia,'["\]\[]','')

but I'm receiving this error below:

Error: Error while compiling statement: FAILED: SemanticException [Error 10014]: Line 53:1 Wrong arguments '''': No matching method for class org.apache.hadoop.hive.ql.udf.UDFRegExpReplace with (array<string>, string, string). Possible choices: _FUNC_(string, string, string) (state=42000,code=10014)

Upvotes: 1

Views: 1697

Answers (3)

Fahmi
Fahmi

Reputation: 37473

You can try the below -

select regexp_replace(s_agr.sig_tecnologia, '\\[\\"\\]', '')

Upvotes: 1

Henrique Branco
Henrique Branco

Reputation: 1940

I've solved the problem with this question here.

The problem is because I was using collect_set to generate all those values together and they were of type array of strings not strings.

If I then use concat_ws(',',collect_set(column)) my problem is solved.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269823

How about replace()?

select replace(replace(replace(col, '"', ''), '[', ''), ']', '')

Upvotes: 3

Related Questions