Reputation: 63
The task:
I'm trying to get the attribute-value from xml tags with a shell script, split the value up and save them in a .csv-file.
This is how the xml looks like:
<host>
<servers>
<server name="Type1Name1-Port1" >...</server>
<server name="Type2Name2-Port2" >...</server>
<server name="Type3Name3-Port3" >...</server>
...
<server name="TypexNamex-Portx" >...</server>
</servers>
</host>
I'd like to get the values from the "name"-attribute and split them up like following:
Type;Name;Port
The output csv file I want should look like this:
Type1;Name1;Port1
Type2;Name2;Port2
Type3;Name3;Port3
...
Typex;Namex;Portx
The problem:
I can use any shell-language I want to. I prefer bash and ksh.
My questions:
EDIT:
Example data of a server-name:
T-TTT_AAA-A-SSS-PPPP
Where T represents the Type, A the Applicationname, S the Server-Name, P the Port. The length of T, A and S are variable. P is constant.
Upvotes: 2
Views: 2503
Reputation: 20032
Without xmllint you can parse input like
<host>
<servers>
<server name="Type1_Name1-Port1" >...</server>
<server name="Type-2_Name2-Port2" >...</server>
<server name="Type3_Name-3-Port3" >...</server>
</servers>
</host>
with
sed -n '/<server name=/ s/[^"]*"\([^_]*\)_\([^"]*\)-\([^"]*\)".*/\1;\2;\3/p' inputfile
Upvotes: 1
Reputation: 5072
Here is what I came up with, using only common tools : xmllint
and sed
:
echo 'cat //host/servers/server/@name' | xmllint --shell data.xml | sed -n 's: name=\"\([A-Z][a-z0-9]*\)\([A-Z][a-z0-9]*\)-\(.*\)\":\1,\2,\3:p'
The sed
part is done according to OP's examples at the moment of posting.
Breakdown:
echo 'cat //host/servers/server/@name'
: we pass this command to xmllint
. It will catch the name
attribute of all the nodes inside <host><servers><server ...> ... </server></servers></hosts>
xmllint --shell data.xml
: iterates through data.xml
and executes the commands passed as argument in an interactive shell.sed -n 's: name=\"\([A-Z][a-z0-9]*\)\([A-Z][a-z0-9]*\)-\(.*\)\":\1;\2;\3:p'
: we process the output of xmllint
to only keep the data we are interested
xmllint
will produce the following output : name="Type1Name1-Port1"
Type
), another capital letter followed by any character except capital (for Name
), and any character between the -
and "
characterOutput :
Type1;Name1;Port1
Type2;Name2;Port2
Type3;Name3;Port3
Typex;Namex;Portx
EDIT:
To fit the pattern you indicated in the comments, you'll just have to change the sed regex, for instance :
sed -n 's: name=\"\(.*\)_\(.*\)-\(.\{4\}\)\":\1,\2,\3:p'
This will match the format T-TTT_AAA-A-SSS-PPPP
, with any length for the type and server name. Try to fiddle around the regex or ask another question in the regex
tag if this is not exactly what you need.
Upvotes: 1