Brick
Brick

Reputation: 119

Find words in file starting with character and prepend / append text

I am trying to write a log scraper that will add links to text,

So for example the log contains-

This is some text TK-12354 aasdgf asdf 
adsf aasdf TK-122 sadf sfdg   sfdgsdfg
dghgf sfdg sdfg  sdfg sdgf dsf TK-1243

And I want to turn all of the 'TK-' words into links by adding

<a href="https://website/browse/TK-######/">TK-######</a>

So the above text would look like this-

This is some text <a href="https://website/browse/TK-12354/">TK-12354</a> aasdgf asdf 
adsf aasdf <a href="https://website/browse/TK-122/">TK-122</a> sadf sfdg   sfdgsdfg
dghgf sfdg sdfg  sdfg sdgf dsf <a href="https://website/browse/TK-1243/">TK-1243</a>

I've came up with a way of doing it in bash, but its really clunky, and takes forever to run though the file-

IFS=$'\n'
declare -a COMMENTS=($(cat /usr/local/statusTEST.dat | grep -n "TK-"))

for COMMENT in "${COMMENTS[@]}"
    do
        LINE=`echo $COMMENT | cut -d : -f 1`
        TICKET=`echo $COMMENT | grep -o '\bTK-\w*'`
        
        sed -i "${LINE}s/$TICKET/\<a href\=\"https\:\/\/website.com\/browse\/$TICKET\"\>$TICKET\<\/a\>/g" "/usr/local/statusTEST.dat"

    done

I've tried to do it using sed to just read and change the whole file but I cant quite get the syntax to work with adding the ref into the url or even being able to apend the to the end i.e.

sed -i "s/\bTK-/\<a href\=\"https\:\/\/website.com\/browse\/g"

Does anyone have any ideas?

Upvotes: 2

Views: 63

Answers (1)

anubhava
anubhava

Reputation: 785581

You can use a single sed command like this:

sed 's~TK-[0-9]*~<a href="https://website/browse/&/">&</a>~g' file

This is some text <a href="https://website/browse/TK-12354/">TK-12354</a> aasdgf asdf
adsf aasdf <a href="https://website/browse/TK-122/">TK-122</a> sadf sfdg   sfdgsdfg
dghgf sfdg sdfg  sdfg sdgf dsf <a href="https://website/browse/TK-1243/">TK-1243</a>
  • & is back-reference for complete match in sed
  • I used ~ as regex delimiter in sed to avoid excessive escaping of / in replacement text.

As per comment below:

if I wanted to ignore entries that had already been done

You may use this sed with 2 capture groups that matches TK strings only if it is preceded by white space:

sed -E 's~(^|[[:blank:]])(TK-[0-9]+)~\1<a href="https://website/browse/\2/">\2</a>~g' file

Upvotes: 2

Related Questions