Reputation: 702
This is my sample file:
<?xml version="1.0" encoding="UTF-8" ?>
<testjar>
<testable>
<trigger>Trigger1</trigger>
<message->2012-06-14T00:03.54</message>
<sales-info>
<san-a>no</san-a>
<san-b>no</san-b>
<san-c>no</san-c>
</sales-info>
</testable>
</testjar>
I need to extract xml tags from this-
e.g. output of above file should be
testjar
testable
trigger
message
sales-info
....
Upvotes: 1
Views: 727
Reputation: 21990
$> cat ./text
<?xml version="1.0" encoding="UTF-8" ?>
<testjar>
<testable>
<trigger>Trigger1</trigger>
<message>2012-06-14T00:03.54</message>
<sales-info>
<san-a>no</san-a>
<san-b>no</san-b>
<san-c>no</san-c>
</sales-info>
</testable>
</testjar>
And
$> grep -P -o "(?<=\<)[^>?/]*(?=\>)" ./text
testjar
testable
trigger
message
sales-info
san-a
san-b
san-c
Regular expression (?<=\<)[^>?/]*(?=\>)
consist of 3 parts:
(?<=\<)
: (?<=)
is lookbehind operator, so it means "after <";
[^>?/]*
: not >
,?
,/
0 or more times;
(?=\>)
: (?=)
is lookahead operator, so it means "before >"
Upvotes: 3
Reputation: 23145
awk -F">" '{print $1}' xmlfile | sed -e '/<\//d' -e '/<?/d' -e 's/<//g'
Upvotes: 0