carrieje
carrieje

Reputation: 495

Sed: substitute pattern, not limited to matching line, but to another pattern

I would like to surround multiple words with quotes. Easily done task with sed and grouping.

Except that my words are located in an attribute of an xml tag.

<daddy>
    <son name="blabla">
        <belongs having="car cat doll" color="yellow" />
    </son>
</daddy>

I want having attribute to be postprocessed to "'car' 'cat' 'doll'". having is a uniquely affected attribute name. So, no danger to match only this word, it will automatically be part of a belongs tag. I think this is a good start to be able to use sed here, and don't do hard things with heavy tools, and xml readers.

My first attempt was to match the pattern to filter the lines, and try to surround the words. But it surrounds them, matching in the whole line, and not only in the first pattern. Which is what I wanted.

sed "/having=\"[a-z ]\+\"/ s/\([a-z]\+\)/'\1'/g"

.

<daddy>
    <son name="blabla">
        <'belongs' 'having'="'car' 'cat' 'doll'" 'color'="'yellow'" />
    </son>
</daddy>

My second attempt, with group matching led me no more further...

sed "s/havings=\"\(\([a-z]\+\) \?\)*\"/havings=\"'\2'\"/g"

.

<daddy>
    <son name="blabla">
        <belongs having="'doll'" color="yellow"/>
    </son>
</daddy>

Upvotes: 1

Views: 102

Answers (2)

carrieje
carrieje

Reputation: 495

I decided to give up using only sed... I did something which is awful and tends to produce errors in substitutions... But I will diff my ouputs afterwards.

#!/bin/bash

O=$IFS

# For every file passed in argument
for f in "$@"
do
    IFS=$(echo -en "\n\b")
    # For every field content
    for p in $(egrep -o 'having="[^"]*"' $f | egrep -o '".*"' | grep -v '&quote;' | sort -u);
    do
        # Match every occurrence of this content on the lines of "having" and surround its words
        sed "/having/ s/$p/$(echo $p | sed 's/\([a-z]\+\)/\&quote;\1\&quote;/g')/" $f -i
    done
    IFS=$O
done

Upvotes: 0

NeronLeVelu
NeronLeVelu

Reputation: 10039

sed ":a
/having/ {
   s/\"\(\( *'[^ ]\{1,\}'\)* *\)\([^ '\"]\{1,\}\)\([^\"]*\)\"/\"\1'\3'\4\"/
   t a
   }" YourFile

replace each group of word (char that are not space or quote or double quote) by itself surrounded by simple quote. use a recursif to change word that are between a double quote after all group of word surrounded by simple quote. This because, option g cannot be used with back reference, so work around use groupe by taking a big group of all word that are previously quoted, cycling until ther is no more unquoted word

I assume that the content is on 1 line (because of sed default behaviour) and the same line as having

Upvotes: 1

Related Questions