xchange
xchange

Reputation: 485

regex (preferably sed) to replace any occurrence of a certain char within a specific pattern match

In a random string how do you replace every % with ^% except when it is part of a quote?

e.g.

r%"%a"%n"%d%"o%"%%m to become

r^%"%a"^%n"%d%"o^%"^%^%m


A related topic is discussed there. However its sed solution does not fully apply here. In fact it would cause an infinite loop because its branch :t appears to expect the char in question to be substituted in any case, but here it is actually left untouched if there is no pattern match.

Regarding non-sed solutions, I was hoping I could maybe translate some of those techniques to sed, however they all seem to be quite unique to the tools they're used with. So while probably not the most efficient I'd appreciate any focus on sed.

Upvotes: 0

Views: 112

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627336

I'd suggest using perl in such cases:

perl -pe 's/"[^"]*"(*SKIP)(*F)|%/^%/g'

See the regex demo.

Details:

  • "[^"]*"(*SKIP)(*F) - matches a ", then zero or more chars other than ", and then a " (so, basically "..." substrings with no " inside) and then skips this match and starts looking for the next match after the closing " position
  • | or
  • % - match % in any other context.

Upvotes: 2

sam V
sam V

Reputation: 1

sed -E ':a;s/([^"]*%[^"]*)/\1^%/;ta' <<< 'r%"%a"%n"%d%"o%"%%m'

This command uses a loop to repeatedly search for % outside of quotes and replace it with ^%. Here's a breakdown of the command:

  • -E: Enables extended regular expressions.
  • :a: Defines a label a for the loop.
  • s/([^"]*%[^"]*)/\1^%/: Searches for % outside of quotes and replaces it with ^%.
  • ta: If a replacement was made, jumps back to label a to continue processing. This will transform your input string r%"%a"%n"%d%"o%"%%m to r^%"%a"^%n"%d%"o^%"^%^%m.

Upvotes: -2

anubhava
anubhava

Reputation: 785856

Using gnu-awk it is fairly straight forward:

s='r%"%a"%n"%d%"o%"%%m'
awk -v RS='"[^"]*"' '{gsub(/%/, "^&"); ORS=RT} 1' <<< "$s"

r^%"%a"^%n"%d%"o^%"^%^%m

What it is doing:

  • Using -v RS='"[^"]*"' it tells awk that record separator is each quoted text i.e. "..."
  • Using gsub(/%/, "^&") it replaces remaining % (outside the quotes) with &%
  • Using ORS=RT sets output record separator with the matched input text in v RS='...'
  • 1 outputs each record

Upvotes: 2

Renaud Pacalet
Renaud Pacalet

Reputation: 29290

Because of the not part of a quote constraint, sed is probably not the easiest choice. With awk you could declare double quotes as the input/output field separators and skip the quoted parts by modifying only the odd and last fields.

The following assumes that there are no ^% in the input, or that they shall be replaced with ^^%:

$ awk '
BEGIN {FS = OFS = "\""}
{for(i = 1; i <= NF; i += (i == NF - 1 ? 1 : 2))
   gsub(/%/, "^%", $i)}
1' <<< 'r%"%a"%n"%d%"o%"%%m'
r^%"%a"^%n"%d%"o^%"^%^%m

Upvotes: 2

Related Questions