Pravin Satav
Pravin Satav

Reputation: 702

Extracting xml tages from xml file through unix script/command

This is my sample file:

<?xml version="1.0" encoding="UTF-8" ?>
 <testjar>
 <testable>
  <trigger>Trigger1</trigger>
  <message->2012-06-14T00:03.54</message>
 <sales-info>
  <san-a>no</san-a>
  <san-b>no</san-b>
  <san-c>no</san-c>
  </sales-info>
  </testable>
  </testjar>

I need to extract xml tags from this-

e.g. output of above file should be

testjar
testable
trigger
message
sales-info
....

Upvotes: 1

Views: 727

Answers (2)

$> cat ./text
<?xml version="1.0" encoding="UTF-8" ?>
 <testjar>
 <testable>
  <trigger>Trigger1</trigger>
  <message>2012-06-14T00:03.54</message>
 <sales-info>
  <san-a>no</san-a>
  <san-b>no</san-b>
  <san-c>no</san-c>
  </sales-info>
  </testable>
  </testjar>

And

$> grep -P -o "(?<=\<)[^>?/]*(?=\>)" ./text
testjar
testable
trigger
message
sales-info
san-a
san-b
san-c 

Regular expression (?<=\<)[^>?/]*(?=\>) consist of 3 parts:

  • (?<=\<): (?<=) is lookbehind operator, so it means "after <";

  • [^>?/]*: not >,?,/ 0 or more times;

  • (?=\>): (?=) is lookahead operator, so it means "before >"

Upvotes: 3

cppcoder
cppcoder

Reputation: 23145

awk -F">" '{print $1}' xmlfile | sed -e '/<\//d' -e '/<?/d' -e 's/<//g'

Upvotes: 0

Related Questions