Reputation: 1631
I have few thousands of text lines like this:
go to <CITY>rome</CITY> <COUNTRY>italy</COUNTRY>
My desired output is to replace everything from the first tagged word (rome) to the last one (italy) and put tag:
go to <ADDRESS>rome italy</ADDRESS>
I can match the portion of the text line which is tagged with:
<.*>
This will greedily select all text from first < to last >. I would like then the tags removed and put <ADDRESS>
and </ADDRESS>
around the matched portion.
The possible tags are: <STREETNUM>
, <STREET>
, <CITY>
, <STATE>
, <ZIP>
and <COUNTRY>
. Any subset of these tags can appear and in any order. The tags are never nested.
I have searched SO and googled to no avail. Perhaps I can use a named capturing group and then apply search/replace regex on it but I don't know how. Any help would appreciated.
Upvotes: 2
Views: 63
Reputation: 16039
This sed
line will do it:
sed 's/<CITY>\(.*\)<\/CITY>.*<COUNTRY>\(.*\)<\/COUNTRY>/<ADDRESS>\1 \2<\/ADDRESS> /g'
For example:
sed 's/<CITY>\(.*\)<\/CITY>.*<COUNTRY>\(.*\)<\/COUNTRY>/<ADDRESS>\1 \2<\/ADDRESS> /g' <<< "go to <CITY>rome</CITY> <COUNTRY>italy</COUNTRY>"
It prints:
go to <ADDRESS>rome italy</ADDRESS>
It basically captures what is inside the CITY
tag and inside the COUNTRY
tag and then replace them with the captured groups values enclose the ADDRESS
tag
If you're using Linux, you can avoid escaping (
using the -E
flag:
sed -E 's/<CITY>(.*)<\/CITY>.*<COUNTRY>(.*)<\/COUNTRY>/<ADDRESS>\1 \2<\/ADDRESS> /g'
UPDATE:
To achieve the expected result you could use several commands in the following order of operation:
go to
text: sed 's/go to //g'
tr -d '</>'
Once all tag chars are removed, you can safely delete the words STREETNUM
, STREET
, CITY
, STATE
, ZIP
and COUNTRY
from the input:
sed -E 's/CITY|COUNTRY|STATE|ZIP|STREETNUM|STREET//g'
Take the output generated from the previous commands concatenation and output it inside the <ADDRESS></ADDRESS>
tags:
xargs -i echo "go to <ADDRESS>{}</ADDRESS>"
The final command is the following, here $LINE
should contain the line to process:
sed 's/go to //g' <<< "$LINE" | tr -d '</>' | sed -E 's/CITY|COUNTRY|STATE|ZIP|STREETNUM|STREET//g' | xargs -i echo "go to <ADDRESS>{}</ADDRESS>"
An example:
Running:
sed 's/go to //g' <<< "go to <STATE>Bolivar</STATE> <COUNTRY>Venezuela</COUNTRY> <STREETNUM>5</STREETNUM> " | tr -d '</>' | sed -E 's/CITY|COUNTRY|STATE|ZIP|STREETNUM|STREET//g' | xargs -i echo "go to <ADDRESS>{}</ADDRESS>"
Will print:
go to <ADDRESS>Bolivar Venezuela 5 </ADDRESS>
Upvotes: 2