Dee
Dee

Reputation: 309

Is regex for Hive different from the normal regex?

I am using Hive to analyse a web log that looks like this

415503 - - [10/Jun/1998:00:48:00 +0000] "GET /english/images/nav_sitemap_off.gif HTTP/1.1" 200 416

I used the regex below to load this to the hive table which works fine

([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)

But if i check for this regex in https://www.regex101.com/, this regex doesn't match with my string.

If i remove some backslashes in the block

(-|\\[[^\\]]*\\]) 

it is validated.

I think we have to put \ to escape strings when i comes to regex in Hive? But how do i validate this before creating the database?

Upvotes: 3

Views: 3642

Answers (1)

Ronak Patel
Ronak Patel

Reputation: 3849

Hive uses Java regex syntax. Try http://www.fileformat.info/tool/regex.htm for testing purposes.

see Apache Hive - REGEXColumnSpecification for details.

Test with your input

Upvotes: 7

Related Questions