vahdet
vahdet

Reputation: 6729

Remove namespace prefix with sed

I want to convert this piece of xml:

<v1:table>
  <v1:tr>
    <v1:td>Apples</v1:td>
    <v1:td>Bananas</v1:td>
  </v1:tr>
</v1:table>

into the following by removing the namespace prefixes (i.e. v1) and get the following by using sed:

<table>
  <tr>
    <td>Apples</td>
    <td>Bananas</td>
  </tr>
</table>

Is it possible?

EDIT: I also want to state that the xml is kept in a file.

Upvotes: 3

Views: 1570

Answers (2)

Benjamin W.
Benjamin W.

Reputation: 52181

Here's how you could do it with hxpipe and hxunpipe from the W3C HTML-XML-utils (packaged for many distributions):

$ hxpipe infile | sed 's/^\([()]\)v1:/\1/g' | hxunpipe
<table>
  <tr>
    <td>Apples</td>
    <td>Bananas</td>
  </tr>
</table>

hxpipe parses XML/HTML and turns it into an awk/sed-friendly line based format:

$ hxpipe infile
(v1:table
-\n  
(v1:tr
-\n    
(v1:td
-Apples
)v1:td
-\n    
(v1:td
-Bananas
)v1:td
-\n  
)v1:tr
-\n
)v1:table
-\n

where lines starting with ( and ) are opening and closing tags, so removing the first v1: from lines starting with ( or ) (which is what the sed command above does) achieves the desired effect. Notice that text lines start with a -, so there can't be any false positives.

Upvotes: 4

anubhava
anubhava

Reputation: 785316

This sed works for your example:

sed -E 's~(</?)v1:~\1~g' file

<table>
  <tr>
    <td>Apples</td>
    <td>Bananas</td>
  </tr>
</table>

However just a note that sed is not the best tool for parsing HTML/XML. Consider using HTML parsers.

Upvotes: 1

Related Questions