happy
happy

Reputation: 2628

Lucene Regex for alphanumeric match but not all numeric

I want to find the alphanumeric words in lucene automata regex but not entirely numeric and even not entirely alphabets. I have tried

(([a-zA-Z0-9]{1,10})&(.*[0-9].*))

but this returns all numeric words also So i tried to negate all numeric like below but it does not work

(^[0-9])(([a-zA-Z0-9]{1,10})&(.*[0-9].*))

Input String:

  1. DL200, dal2 , 700091

Expected output: DL200 and dal2

but it should not return 700091

Upvotes: 0

Views: 1527

Answers (2)

JvdV
JvdV

Reputation: 75950

Didn't know much about lucene regex flavor, but a little research tought me that it does not support PCRE library, however some standard operators are supported. I found that it does not include lookarounds nor word boundaries. Have a look at the docs.

Either way, to overcome the lack of support on lookarounds I had a look at this older SO post to use ~ instead. Furthermore, I see you can use the & operator to check if the string matches multiple patterns.

This makes for the assumption the following pattern might work for you:

~[0-9]+&~[^0-9]+&[A-Za-z0-9]{2,10}
  • ~[0-9]+ - Negate a string made of numbers only.
  • &
  • ~[^0-9]+ - Negate a string made of non-numbers only.
  • &
  • [A-Za-z0-9]{2,10} - Matches a string that is made out of 2 to 10 alphanumeric characters.

Upvotes: 1

happy
happy

Reputation: 2628

With the help of the JvdV answer and with the help of https://stackoverflow.com/a/38665819/9758194, I was able to get the desired output

(([a-zA-Z0-9]{1,10})&(.*[0-9].*))&~([0-9]*)

Upvotes: 1

Related Questions