Shankar
Shankar

Reputation: 8967

Remove leading zeros using HiveQL

I have a string value in which i might have leading zero's, so i want to remove all leading zeros.

For example:

accNumber = "000340" ---> "340"

Any UDF is available in Hive? can we use regexp_extract for this?

Upvotes: 3

Views: 12596

Answers (3)

Indent
Indent

Reputation: 4967

You can use a better regex if you need preserve :

  • single ZERO
  • invalid NUMBER

^0+(?!$) => using Negative Lookahead (?!$)

with
cte_data_test as (
    select '0123'               as txt
    union all
    select '00123'              as txt
    union all
    select '0'                  as txt
    union all
    select '0000'               as txt
    union all
    select cast(null as string) as txt
    union all
    select 'bad_number'         as txt
)
select
    txt,
    regexp_replace(txt,'^0+(?!$)','')  as using_regexp_replace_a,
    regexp_replace(txt, "^0+", '')     as using_regexp_replace_b,
    cast(cast(txt as INT) as STRING)   as using_cast
from 
    cte_data_test
;

will produce :

+-------------+-------------------------+-------------------------+-------------+
|     txt     | using_regexp_replace_a  | using_regexp_replace_b  | using_cast  |
+-------------+-------------------------+-------------------------+-------------+
| 0123        | 123                     | 123                     | 123         |
| 00123       | 123                     | 123                     | 123         |
| 0           | 0                       |                         | 0           |
| 0000        | 0                       |                         | 0           |
| NULL        | NULL                    | NULL                    | NULL        |
| bad_number  | bad_number              | bad_number              | NULL        |
+-------------+-------------------------+-------------------------+-------------+

Upvotes: 2

Ajay Ahuja
Ajay Ahuja

Reputation: 1313

You can also use,

SELECT cast(cast("000340" as INT) as STRING) col_without_leading_zeroes 
  FROM db.table;

output : 340 (Datatype will be string)

Hope this is helpful.

Upvotes: 1

o-90
o-90

Reputation: 17593

Yes, just use REGEXP_REPLACE().

SELECT some_string,
   REGEXP_REPLACE(some_string, "^0+", '') stripped_string
FROM db.tbl

(fixed simple typo with comma)

Upvotes: 7

Related Questions