Reputation: 339
So I have a regex that matches to pull out data that I am looking for in text:
([A-Z]+A{5,})
This will select the code I am looking for in the following sample text:
Use these licenses with the VMware ESX build.
Feature License Code Description
------------------- ---------------------------- --------------------------------------------
CIFS CAYHXPKBFDUFZGABGAAAAAAAAAAA CIFS protocol
FCP APTLYPKBFDUFZGABGAAAAAAAAAAA Fibre Channel Protocol
My desired end result is to do a replace on the document that will yield a text document that contains the text
CAYHXPKBFDUFZGABGAAAAAAAAAAA,APTLYPKBFDUFZGABGAAAAAAAAAAA
Upvotes: 1
Views: 115
Reputation: 48751
You could add an alternation to your regex like this:
([A-Z]+A{5,})|\X
Then replace it with:
(?1$1,)
Replacement string means, if first capturing group is matched replace it with $1,
otherwise replace it with nothing.
In comments I added a negative lookahead to avoid adding comma after a matched sub-string if found at the end. But an extra trailing comma is inevitable with this regex.
A more better approach:
(\b[A-Z]++\b(?<=A{5}))|\X
This uses a possessive quantifier and a lookbehind for ending A
s. You don't need to look for A{5,}
but you only need to look for A{5}
. Word boundaries could be removed if you want to match such strings even if found in middle of a longer word.
Upvotes: 3