Reputation: 142
I am working with a data set that needs to be scrubbed. I am looking to replace the question marks(?) with the em-dash code(—
). Here is an example string:
"...shut it down?after taking a couple of..."
I can match that instance with this expression: \w\?\w However, it matches one character on either side of the question mark. So the replace looks like this:
"...shut it dow—
after taking a couple of..."
How can I match just the pattern while only replacing the question mark?
Thanks in advance, Jason
Upvotes: 0
Views: 493
Reputation: 28739
Use: /\b\?\b/
\b matches word boundaries, which seems to be what you're after.
Upvotes: 2
Reputation: 94153
If the language you are using supports lookarounds, you could use them to make sure your question mark is surrounded by word characters, but not actually capture them:
/(?<=\w)\?(?=\w)/
The (?<=\w)
is a lookbehind (the engine looks "behind" -- before -- a potential match) and the (?=\w)
is a lookahead (the engine looks ahead). Lookarounds are not captured, so in your case, only the question mark will be, and then you can replace it.
In PHP, for example, you could thus do:
$string = "...shut it down?after taking a couple of..."
preg_replace('/(?<=\w)\?(?=\w)/', "—", $string);
// results in ...shut it down—after taking a couple of...
Lookarounds are supported by PCRE-based (perl compatible) regular expression engines, although Ruby doesn't support lookbehinds.
Upvotes: 3
Reputation: 70414
Hard to answer if we don't know which technology are you using. If you are writing a JS this will do it
inputStr.replace(/(\w)\?(\w)/, '$1—$2');
Upvotes: 2
Reputation: 120644
If it is PHP (I'm basing that on other questions you have asked), this should do it:
$str = preg_replace('/(\w)\?(\w)/i', '\\1—\\2', $str);
Upvotes: 3