Bala
Bala

Reputation: 11234

How to limit characters when using regexp_extract in hive?

I have a fixed length string in which I need to extract portions as fields. First 5 characters to ACCOUNT1, next 2 characters to ACCOUNT2 and so on.

I would like to use regexp_extract (not substring) but I am missing the point. They return nothing.

select regexp_extract('47t7916A2088M040323','(.*){0,5}',1) as ACCOUNT1,
       regexp_extract('47t7916A2088M040323','(.*){6,8}',1) as ACCOUNT2 --and so on

Upvotes: 2

Views: 1284

Answers (1)

leftjoin
leftjoin

Reputation: 38290

If you want using regexp then use it like in this example. For Account1 expression '^(.{5})' means: ^ is a beginning of the string, then capturing group 1 consisting of any 5 characters (group is in the round brackets). {5} - is a quantifier, means exactly 5 times. For Account2 - capturing group 2 is after group1. (.{2}) - means two any characters.

In this example in the second regexp there are two groups(for first and second column) and we extracting second group.

hive> select regexp_extract('47t7916A2088M040323','^(.{5})',1) as Account1,
    >  regexp_extract('47t7916A2088M040323','^(.{5})(.{2})',2) as Account2;
OK
47t79   16
Time taken: 0.064 seconds, Fetched: 1 row(s)

Actually you can use the same regexp containing groups for all columns, extracting different capturing groups.
Example using the same regexp and extracting different groups:

hive> select regexp_extract('47t7916A2088M040323','^(.{5})(.{2})',1) as Account1,
    > regexp_extract('47t7916A2088M040323','^(.{5})(.{2})',2) as Account2
    > ;
OK
47t79   16
Time taken: 1.043 seconds, Fetched: 1 row(s)

Add more groups for each column. This approach works only for fixed length columns. If you want to parse delimited string, then put the delimiter characters between groups, modify group to match everything except delimiters and remove/modify quantifiers. For such example substring or split for delimited string looks much more simple and cleaner, regexp allows to parse very complex patterns. Hope you have caught the idea.

Upvotes: 1

Related Questions