Reputation: 309
I am using Hive to analyse a web log that looks like this
415503 - - [10/Jun/1998:00:48:00 +0000] "GET /english/images/nav_sitemap_off.gif HTTP/1.1" 200 416
I used the regex below to load this to the hive table which works fine
([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)
But if i check for this regex in https://www.regex101.com/, this regex doesn't match with my string.
If i remove some backslashes in the block
(-|\\[[^\\]]*\\])
it is validated.
I think we have to put \ to escape strings when i comes to regex in Hive? But how do i validate this before creating the database?
Upvotes: 3
Views: 3642
Reputation: 3849
Hive uses Java regex syntax. Try http://www.fileformat.info/tool/regex.htm for testing purposes.
see Apache Hive - REGEXColumnSpecification for details.
Upvotes: 7