Betafish
Betafish

Reputation: 1262

pattens in regexNER in core-nlp

I would like to have a regex pattenr for regexner inside the core-nlp pipeline. my entity/token is

Machine_DS2302

Where the second part is alphanumeric.

What I have currently is

Machine_.*  MachineNumber

But, this annotates everything (this is being a wildcard). I would like to add the tag as MachineNumber based on the the regex in the second part i.e. if the second part after _ is a number, then assign it the said tag.

The regex pattern

^[a-zA-Z0-9]*$

But even

Machine_^[a-zA-Z0-9]*$

Does not work

How would such a pattern look like for the regexNER?

Upvotes: 1

Views: 468

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627110

The anchors are redundant, they actually prevent the pattern from matching because ^ matches a string start location and $ matches the string end location.

Since you need to have access to the part after _, you need to also capture, so use a capturing group:

Machine_([a-zA-Z0-9]*)

The (...) will create a submatch with the alphanumeric value.

Note that you might want to replace * with + if the alphanumeric part should consist of at least 1 char.

Upvotes: 1

Related Questions