How to extract a string using Regex for POS Tagging

With reference to the question, I am facing some difficulties in that solution for the following example.

 "I/PRP did/VBD n't/RB experienced/VBN much/JJ service/NN differentiation/NN" The/DT desktop/NN and/CC CAD/NN support/NN is/VBZ working/VBG as/IN expected/VBN CAD-support/NNP Desktop/NNP management/NN related/VBD to/TO LSB/NNP Desktop/NNP management/NN team/NN is/VBZ very/RB committed/VBN ./." 

The result is not coming as expected because of having apostrophe in "n't" and hyphen "CAD-Support". I am posting this as a new query as requested. Can anyone help me to resolve this. Thanks!!

Upvotes: 3

Views: 565

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627086

If you want to use the previous solution, all you need to change is the regex to

[^\s/]+

in code:

str_extract_all(str1, "[^\\s/]+")

See the regex demo.

It will match 1 or more chars other than whitespace and /.

To avoid matching ./., you'll need to use something like

\w+(?:['-]\w+)*

in code:

str_extract_all(str1, "\\w+(?:['-]\\w+)*")

that will match 1+ word chars followed with 0+ sequences of ' or - followed with 1+ word chars. See this regex demo.

Upvotes: 1

Related Questions