Reputation: 119
I am trying to write a log scraper that will add links to text,
So for example the log contains-
This is some text TK-12354 aasdgf asdf
adsf aasdf TK-122 sadf sfdg sfdgsdfg
dghgf sfdg sdfg sdfg sdgf dsf TK-1243
And I want to turn all of the 'TK-' words into links by adding
<a href="https://website/browse/TK-######/">TK-######</a>
So the above text would look like this-
This is some text <a href="https://website/browse/TK-12354/">TK-12354</a> aasdgf asdf
adsf aasdf <a href="https://website/browse/TK-122/">TK-122</a> sadf sfdg sfdgsdfg
dghgf sfdg sdfg sdfg sdgf dsf <a href="https://website/browse/TK-1243/">TK-1243</a>
I've came up with a way of doing it in bash, but its really clunky, and takes forever to run though the file-
IFS=$'\n'
declare -a COMMENTS=($(cat /usr/local/statusTEST.dat | grep -n "TK-"))
for COMMENT in "${COMMENTS[@]}"
do
LINE=`echo $COMMENT | cut -d : -f 1`
TICKET=`echo $COMMENT | grep -o '\bTK-\w*'`
sed -i "${LINE}s/$TICKET/\<a href\=\"https\:\/\/website.com\/browse\/$TICKET\"\>$TICKET\<\/a\>/g" "/usr/local/statusTEST.dat"
done
I've tried to do it using sed to just read and change the whole file but I cant quite get the syntax to work with adding the ref into the url or even being able to apend the to the end i.e.
sed -i "s/\bTK-/\<a href\=\"https\:\/\/website.com\/browse\/g"
Does anyone have any ideas?
Upvotes: 2
Views: 63
Reputation: 785581
You can use a single sed
command like this:
sed 's~TK-[0-9]*~<a href="https://website/browse/&/">&</a>~g' file
This is some text <a href="https://website/browse/TK-12354/">TK-12354</a> aasdgf asdf
adsf aasdf <a href="https://website/browse/TK-122/">TK-122</a> sadf sfdg sfdgsdfg
dghgf sfdg sdfg sdfg sdgf dsf <a href="https://website/browse/TK-1243/">TK-1243</a>
&
is back-reference for complete match in sed
~
as regex delimiter in sed
to avoid excessive escaping of /
in replacement text.As per comment below:
if I wanted to ignore entries that had already been done
You may use this sed
with 2 capture groups that matches TK
strings only if it is preceded by white space:
sed -E 's~(^|[[:blank:]])(TK-[0-9]+)~\1<a href="https://website/browse/\2/">\2</a>~g' file
Upvotes: 2