user1855851
user1855851

Reputation: 27

How to remove comment from XML using shell script

I would like to get port from server.xml that is tomcat server configuration file.

my server.xml is below.

How can I get port from server.xml without comment part

I just want to get only 50000 without 8080.

<Connector port="50000"  maxHttpHeaderSize="8192" protocol="HTTP/1.1"
           maxThreads="2000" minSpareThreads="50" maxSpareThreads="150" />    
<!--
<Connector executor="tomcatThreadPool"
           port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443" />
-->

Upvotes: 1

Views: 824

Answers (4)

Reino
Reino

Reputation: 3443

The title is rather misleading, because what you really want is to extract the value of the "port"-attribute.
Please don't use RegEx to parse XML, but use an XML-parser like or instead.

$ xidel -s server.xml -e '//@port'

$ xmlstarlet sel -t -v '//@port' server.xml

Upvotes: 1

firfor
firfor

Reputation: 169

this is a bash script with sed. I think that it is the best solution with sed. if your system is OSX. (IE. macbook pro) you should install a gnu sed firstly.

sed -E  -e\
':start
/<!--/ {
   :loop
   /-->/ {
            s/-->/mockend102499883356/
            s/<!--.*mockend102499883356//
            /<!--/ {
                   b loop
            }
            b done
   }
   :add
   N
   b loop
   :done
}'  filename.xml;

this blog is the detail about the code. if you can read in chinese. remove xml comment in pom.xml of maven project

Upvotes: 0

Samuel Kirschner
Samuel Kirschner

Reputation: 1185

The most simple solution to remove all comments from a textfile I could come up with is:

| sed 's/<!--/\x0<!--/g;s/-->/-->\x0/g' | grep -zv '^<!--' | tr -d '\0' |

To explain:

The sed will put in a null char like this:

<Connector port="50000"  maxHttpHeaderSize="8192" protocol="HTTP/1.1"
           maxThreads="2000" minSpareThreads="50" maxSpareThreads="150" />    
\0<!--
<Connector executor="tomcatThreadPool"
           port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443" />
-->\0

than the grep -z will treat that character as "line seperator" and remove the middle part, and finally tr -d will remove the \0 again so that any following grep's won't treat it as a binary file.

just combine it with the grep you are already using e.g.:

 cat server.xml | sed 's/<!--/\x0<!--/g;s/-->/-->\x0/g' | grep -zv '^<!--' | tr -d '\0' | grep -o 'port="[0-9]*' | grep -o '[0-9]*$'

output:

50000

Upvotes: 1

P....
P....

Reputation: 18411

Following are different results , Please check look arounds for more details.

grep -oP 'port=.*? (?=maxHttpHeaderSize)' server.xml
port="50000"

grep -oP 'port=\K.*? (?=maxHttpHeaderSize)' server.xml
"50000"

grep -oP 'port="\K.*?(?="  maxHttpHeaderSize=)'
50000

Upvotes: 0

Related Questions