Sophia
Sophia

Reputation: 3

Matching a quantifier of 1 or less

I have a line of text that will contain several tilde characters (~). In this scenario, I am working with a string of data that is a report. Each tilde represents a line break/carriage return. What I need to do is match only the single tildes (so that I can then turn them into spaces). I want to leave the groups of multiple tildes as is. I am able to match groups of multiple tildes (using ~{2,}) However, I basically need to match the opposite of that.

Here is a sample of text I am trying to modify:

FINDINGS:~~VASCULAR: The IVC~~~ ~ ~~~~~~~~~~~~, and portal vein appear normal. The aorta is normal in~calibre without~aneurysm

In this example, I would like to match all 3 tildes that are not in a group of other tildes. The "real world" data will contain many tildes throughout - with the possibility of some being at the beginning and/or the end of the string. They may be surrounded by both spaces or characters.

Thank you in advance for your help!

Upvotes: 0

Views: 78

Answers (2)

glenn jackman
glenn jackman

Reputation: 247062

set new [regsub -all {(^|[^~])~([^~]|$)} $str {\1 \2}]

Now that I have some time, some words.

We're looking for a tilde that is not preceded by a tilde and is not followed by a tilde. We could try this: {[^~]~[^~]} that does exactly that. However, that expression requires that there actually be a character before and after: what if the single tilde we're looking for occurs at the beginning or the end of the line? So, we want:

  • the beginning of string OR a non-tilde character (^|[^~]), followed by
  • a tilde, followed by
  • a non-tilde character OR the end of string ([^~]|$).

We need to use capturing parentheses to remember what characters occurred before and after the tilde that we're turning into a space, so the replacement string is {\1 \2} == the character captured by the first set of parentheses, then a space, then the character captured by the second set of parentheses.

We Tcl users are lucky to have the regex engine we have. It is highly performant and very feature-full.

Upvotes: 2

coPro
coPro

Reputation: 126

Edit: Didn't realize TCL doesn't allow lookbehinds. Glenn's answer seems to have it covered though.

You could try this. Just uses negative lookbehind and lookahead to make sure the tilde isn't preceded or followed by another tilde.

(?<!~)~(?!~)

Test here.

Upvotes: 0

Related Questions